
On Thu, Dec 07, 2023 at 08:52:39PM +0800, tugy@chinatelecom.cn wrote:
From: Guoyi Tu <tugy@chinatelecom.cn>
Currently, libvirt creates a thread pool with only on thread to handle all qemu monitor events for virtual machines, In the cases that if the thread gets stuck while handling a monitor EOF event, such as unable to kill the virtual machine process or release resources, the events of other virtual machine will be also blocked, which will lead to the abnormal behavior of other virtual machines.
For instance, when another virtual machine completes a shutdown operation and the monitor EOF event has been queued but remains unprocessed, we immediately destroy and start the virtual machine again, at a later time when EOF event get processed, the processMonitorEOFEvent() will kill the virtual machine that just started.
To address this issue, in the processMonitorEOFEvent(), we check whether the current virtual machine's id is equal to the the one at the time the event was generated. If they do not match, we immediately return.
Signed-off-by: Guoyi Tu <tugy@chinatelecom.cn> Signed-off-by: dengpengcheng <dengpc12@chinatelecom.cn> --- src/qemu/qemu_domain.c | 2 +- src/qemu/qemu_driver.c | 11 +++++++++-- src/qemu/qemu_process.c | 2 +- 3 files changed, 11 insertions(+), 4 deletions(-)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 64afae6450..cfc5b79657 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -3854,7 +3854,8 @@ processJobStatusChangeEvent(virDomainObj *vm,
static void processMonitorEOFEvent(virQEMUDriver *driver, - virDomainObj *vm) + virDomainObj *vm, + int domid) { qemuDomainObjPrivate *priv = vm->privateData; int eventReason = VIR_DOMAIN_EVENT_STOPPED_SHUTDOWN; @@ -3863,6 +3864,12 @@ processMonitorEOFEvent(virQEMUDriver *driver, unsigned int stopFlags = 0; virObjectEvent *event = NULL;
+ if (vm->def->id != domid) { + VIR_ERROR("Domain %s was restarted, ignoring EOF", + vm->def->name); + return; + }
Since this race scenario is something we expect to happen periodically, we don't want VIR_ERROR to be used, as it can make admins (incorrectly) believe something is broken. I'm going to replace this with VIR_DEBUG when I merge it. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|