[libvirt] [PATCH 0/6] libxl bug and race fixes

This series fixes several bugs and races in the libxl driver, as described in the commit messages of the individual patches. One of the races was discussed on the xen-devel ml [1] and resulted in two fixes in libxl [2,3]. I've tested this series with and without the libxl patches. All of the previously discovered races are fixed with this series and the libxl patches. It is possible to fire asserts in libxl with this series and without the libxl patches, but that is no worse than the current behavior. The libxl patches have not yet been committed upstream, hence the 'XXXX' for commit id in patch 1. I will respond to the xen-devel thread now with an ACK on the libxl patches (and a reference to this series) so those can be committed. I'll fixup the commit message for patch 1 later, but wanted to get this series on the list for review before the 1.0.2 freeze. [1] http://lists.xen.org/archives/html/xen-devel/2012-11/msg01016.html [2] http://lists.xen.org/archives/html/xen-devel/2012-12/msg00684.html [3] http://lists.xen.org/archives/html/xen-devel/2012-12/msg00685.html Jim Fehlig (6): libxl: Fix handling of timeouts libxl: Fix races in libxl event code libxl: Fix race between destruction of objects libxl: Explicitly remove timeouts libxl: Fix removing non-persistent domain after save libxl: Find domain object in event handler src/libxl/libxl_conf.h | 9 ++ src/libxl/libxl_driver.c | 298 ++++++++++++++++++++++++++++++++++------------- 2 files changed, 226 insertions(+), 81 deletions(-) -- 1.8.0.1

xen-unstable commit XXXX makes changes wrt modifying and deregistering timeouts. First, timeout modify callbacks will only be invoked with an abs_t of {0,0}, i.e. make the timeout fire immediately. Prior to this commit, timeout modify callbacks were never invoked. Second, timeout deregister hooks will no longer be called. This patch makes changes in the libvirt libxl driver that should be compatible before and after commit XXXX. While at it, fix a potential overflow in the timeout register callback. --- 'commit XXXX' will be replaced with a proper commit id once committed to xen-unstable. src/libxl/libxl_driver.c | 39 ++++++++++++++++++++++++++++++--------- 1 file changed, 30 insertions(+), 9 deletions(-) diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c index a8c4cae..d4f38e3 100644 --- a/src/libxl/libxl_driver.c +++ b/src/libxl/libxl_driver.c @@ -186,7 +186,13 @@ libxlTimerCallback(int timer ATTRIBUTE_UNUSED, void *timer_info) { struct libxlOSEventHookTimerInfo *info = timer_info; + /* libxl expects the event to be deregistered when calling + libxl_osevent_occurred_timeout, but we dont want the event info + destroyed. Disable the timeout and only remove it after returning + from libxl. */ + virEventUpdateTimeout(info->id, -1); libxl_osevent_occurred_timeout(info->priv->ctx, info->xl_priv); + virEventRemoveTimeout(info->id); } static void @@ -202,6 +208,8 @@ libxlTimeoutRegisterEventHook(void *priv, void *xl_priv) { struct timeval now; + struct timeval res; + static struct timeval zero; struct libxlOSEventHookTimerInfo *timer_info; int timeout, timer_id; @@ -211,8 +219,15 @@ libxlTimeoutRegisterEventHook(void *priv, } gettimeofday(&now, NULL); - timeout = (abs_t.tv_usec - now.tv_usec) / 1000; - timeout += (abs_t.tv_sec - now.tv_sec) * 1000; + timersub(&abs_t, &now, &res); + /* Ensure timeout is not overflowed */ + if (timercmp(&res, &zero, <)) { + timeout = 0; + } else if (res.tv_sec > INT_MAX / 1000) { + timeout = INT_MAX; + } else { + timeout = res.tv_sec * 1000 + (res.tv_usec + 999) / 1000; + } timer_id = virEventAddTimeout(timeout, libxlTimerCallback, timer_info, libxlTimerInfoFree); if (timer_id < 0) { @@ -227,19 +242,25 @@ libxlTimeoutRegisterEventHook(void *priv, return 0; } +/* + * Note: There are two changes wrt timeouts starting with xen-unstable + * commit XXXX: + * + * 1. Timeout modify callbacks will only be invoked with an abs_t of {0,0}, + * i.e. make the timeout fire immediately. Prior to this commit, timeout + * modify callbacks were never invoked. + * + * 2. Timeout deregister hooks will no longer be called. + */ static int libxlTimeoutModifyEventHook(void *priv ATTRIBUTE_UNUSED, void **hndp, - struct timeval abs_t) + struct timeval abs_t ATTRIBUTE_UNUSED) { - struct timeval now; - int timeout; struct libxlOSEventHookTimerInfo *timer_info = *hndp; - gettimeofday(&now, NULL); - timeout = (abs_t.tv_usec - now.tv_usec) / 1000; - timeout += (abs_t.tv_sec - now.tv_sec) * 1000; - virEventUpdateTimeout(timer_info->id, timeout); + /* Make the timeout fire */ + virEventUpdateTimeout(timer_info->id, 0); return 0; } -- 1.8.0.1

Jim Fehlig wrote:
xen-unstable commit XXXX makes changes wrt modifying and deregistering timeouts.
First, timeout modify callbacks will only be invoked with an abs_t of {0,0}, i.e. make the timeout fire immediately. Prior to this commit, timeout modify callbacks were never invoked.
Second, timeout deregister hooks will no longer be called.
This patch makes changes in the libvirt libxl driver that should be compatible before and after commit XXXX.
While at it, fix a potential overflow in the timeout register callback. ---
'commit XXXX' will be replaced with a proper commit id once committed to xen-unstable.
libxl patch as been committed to xen-unstable now, changeset 26469. I've updated this patch accordingly. Regards, Jim

On Thu, Jan 24, 2013 at 12:55:09PM -0700, Jim Fehlig wrote:
Jim Fehlig wrote:
xen-unstable commit XXXX makes changes wrt modifying and deregistering timeouts.
First, timeout modify callbacks will only be invoked with an abs_t of {0,0}, i.e. make the timeout fire immediately. Prior to this commit, timeout modify callbacks were never invoked.
Second, timeout deregister hooks will no longer be called.
This patch makes changes in the libvirt libxl driver that should be compatible before and after commit XXXX.
While at it, fix a potential overflow in the timeout register callback. ---
'commit XXXX' will be replaced with a proper commit id once committed to xen-unstable.
libxl patch as been committed to xen-unstable now, changeset 26469. I've updated this patch accordingly.
Regards, Jim
From 451c84284fa1b4f311f2ceb052c8e9f25ffa0cf0 Mon Sep 17 00:00:00 2001 From: Jim Fehlig <jfehlig@suse.com> Date: Mon, 21 Jan 2013 09:59:28 -0700 Subject: [PATCH 1/6] libxl: Fix handling of timeouts
xen-unstable changeset 26469 makes changes wrt modifying and deregistering timeouts.
First, timeout modify callbacks will only be invoked with an abs_t of {0,0}, i.e. make the timeout fire immediately. Prior to this commit, timeout modify callbacks were never invoked.
Second, timeout deregister hooks will no longer be called.
This patch makes changes in the libvirt libxl driver that should be compatible before and after changeset 26469.
While at it, fix a potential overflow in the timeout register callback. --- src/libxl/libxl_driver.c | 39 ++++++++++++++++++++++++++++++--------- 1 file changed, 30 insertions(+), 9 deletions(-)
diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c index a8c4cae..77026fc 100644 --- a/src/libxl/libxl_driver.c +++ b/src/libxl/libxl_driver.c @@ -186,7 +186,13 @@ libxlTimerCallback(int timer ATTRIBUTE_UNUSED, void *timer_info) { struct libxlOSEventHookTimerInfo *info = timer_info;
+ /* libxl expects the event to be deregistered when calling + libxl_osevent_occurred_timeout, but we dont want the event info + destroyed. Disable the timeout and only remove it after returning + from libxl. */ + virEventUpdateTimeout(info->id, -1); libxl_osevent_occurred_timeout(info->priv->ctx, info->xl_priv); + virEventRemoveTimeout(info->id); }
static void @@ -202,6 +208,8 @@ libxlTimeoutRegisterEventHook(void *priv, void *xl_priv) { struct timeval now; + struct timeval res; + static struct timeval zero; struct libxlOSEventHookTimerInfo *timer_info; int timeout, timer_id;
@@ -211,8 +219,15 @@ libxlTimeoutRegisterEventHook(void *priv, }
gettimeofday(&now, NULL); - timeout = (abs_t.tv_usec - now.tv_usec) / 1000; - timeout += (abs_t.tv_sec - now.tv_sec) * 1000; + timersub(&abs_t, &now, &res); + /* Ensure timeout is not overflowed */ + if (timercmp(&res, &zero, <)) { + timeout = 0; + } else if (res.tv_sec > INT_MAX / 1000) { + timeout = INT_MAX; + } else { + timeout = res.tv_sec * 1000 + (res.tv_usec + 999) / 1000; + } timer_id = virEventAddTimeout(timeout, libxlTimerCallback, timer_info, libxlTimerInfoFree); if (timer_id < 0) { @@ -227,19 +242,25 @@ libxlTimeoutRegisterEventHook(void *priv, return 0; }
+/* + * Note: There are two changes wrt timeouts starting with xen-unstable + * changeset 26469: + * + * 1. Timeout modify callbacks will only be invoked with an abs_t of {0,0}, + * i.e. make the timeout fire immediately. Prior to this commit, timeout + * modify callbacks were never invoked. + * + * 2. Timeout deregister hooks will no longer be called. + */ static int libxlTimeoutModifyEventHook(void *priv ATTRIBUTE_UNUSED, void **hndp, - struct timeval abs_t) + struct timeval abs_t ATTRIBUTE_UNUSED) { - struct timeval now; - int timeout; struct libxlOSEventHookTimerInfo *timer_info = *hndp;
- gettimeofday(&now, NULL); - timeout = (abs_t.tv_usec - now.tv_usec) / 1000; - timeout += (abs_t.tv_sec - now.tv_sec) * 1000; - virEventUpdateTimeout(timer_info->id, timeout); + /* Make the timeout fire */ + virEventUpdateTimeout(timer_info->id, 0); return 0; }
ACK, I assume you tested with the older libxl, right ? Daniel -- Daniel Veillard | Open Source and Standards, Red Hat veillard@redhat.com | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | virtualization library http://libvirt.org/

Daniel Veillard wrote:
On Thu, Jan 24, 2013 at 12:55:09PM -0700, Jim Fehlig wrote:
Jim Fehlig wrote:
xen-unstable commit XXXX makes changes wrt modifying and deregistering timeouts.
First, timeout modify callbacks will only be invoked with an abs_t of {0,0}, i.e. make the timeout fire immediately. Prior to this commit, timeout modify callbacks were never invoked.
Second, timeout deregister hooks will no longer be called.
This patch makes changes in the libvirt libxl driver that should be compatible before and after commit XXXX.
While at it, fix a potential overflow in the timeout register callback. ---
'commit XXXX' will be replaced with a proper commit id once committed to xen-unstable.
libxl patch as been committed to xen-unstable now, changeset 26469. I've updated this patch accordingly.
Regards, Jim
From 451c84284fa1b4f311f2ceb052c8e9f25ffa0cf0 Mon Sep 17 00:00:00 2001 From: Jim Fehlig <jfehlig@suse.com> Date: Mon, 21 Jan 2013 09:59:28 -0700 Subject: [PATCH 1/6] libxl: Fix handling of timeouts
xen-unstable changeset 26469 makes changes wrt modifying and deregistering timeouts.
First, timeout modify callbacks will only be invoked with an abs_t of {0,0}, i.e. make the timeout fire immediately. Prior to this commit, timeout modify callbacks were never invoked.
Second, timeout deregister hooks will no longer be called.
This patch makes changes in the libvirt libxl driver that should be compatible before and after changeset 26469.
While at it, fix a potential overflow in the timeout register callback. --- src/libxl/libxl_driver.c | 39 ++++++++++++++++++++++++++++++--------- 1 file changed, 30 insertions(+), 9 deletions(-)
diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c index a8c4cae..77026fc 100644 --- a/src/libxl/libxl_driver.c +++ b/src/libxl/libxl_driver.c @@ -186,7 +186,13 @@ libxlTimerCallback(int timer ATTRIBUTE_UNUSED, void *timer_info) { struct libxlOSEventHookTimerInfo *info = timer_info;
+ /* libxl expects the event to be deregistered when calling + libxl_osevent_occurred_timeout, but we dont want the event info + destroyed. Disable the timeout and only remove it after returning + from libxl. */ + virEventUpdateTimeout(info->id, -1); libxl_osevent_occurred_timeout(info->priv->ctx, info->xl_priv); + virEventRemoveTimeout(info->id); }
static void @@ -202,6 +208,8 @@ libxlTimeoutRegisterEventHook(void *priv, void *xl_priv) { struct timeval now; + struct timeval res; + static struct timeval zero; struct libxlOSEventHookTimerInfo *timer_info; int timeout, timer_id;
@@ -211,8 +219,15 @@ libxlTimeoutRegisterEventHook(void *priv, }
gettimeofday(&now, NULL); - timeout = (abs_t.tv_usec - now.tv_usec) / 1000; - timeout += (abs_t.tv_sec - now.tv_sec) * 1000; + timersub(&abs_t, &now, &res); + /* Ensure timeout is not overflowed */ + if (timercmp(&res, &zero, <)) { + timeout = 0; + } else if (res.tv_sec > INT_MAX / 1000) { + timeout = INT_MAX; + } else { + timeout = res.tv_sec * 1000 + (res.tv_usec + 999) / 1000; + } timer_id = virEventAddTimeout(timeout, libxlTimerCallback, timer_info, libxlTimerInfoFree); if (timer_id < 0) { @@ -227,19 +242,25 @@ libxlTimeoutRegisterEventHook(void *priv, return 0; }
+/* + * Note: There are two changes wrt timeouts starting with xen-unstable + * changeset 26469: + * + * 1. Timeout modify callbacks will only be invoked with an abs_t of {0,0}, + * i.e. make the timeout fire immediately. Prior to this commit, timeout + * modify callbacks were never invoked. + * + * 2. Timeout deregister hooks will no longer be called. + */ static int libxlTimeoutModifyEventHook(void *priv ATTRIBUTE_UNUSED, void **hndp, - struct timeval abs_t) + struct timeval abs_t ATTRIBUTE_UNUSED) { - struct timeval now; - int timeout; struct libxlOSEventHookTimerInfo *timer_info = *hndp;
- gettimeofday(&now, NULL); - timeout = (abs_t.tv_usec - now.tv_usec) / 1000; - timeout += (abs_t.tv_sec - now.tv_sec) * 1000; - virEventUpdateTimeout(timer_info->id, timeout); + /* Make the timeout fire */ + virEventUpdateTimeout(timer_info->id, 0); return 0; }
ACK, I assume you tested with the older libxl, right ?
Yes, but with limited success due to the issue described here http://lists.xen.org/archives/html/xen-devel/2012-11/msg01016.html With Ian Jackson's libxl fixes, I've been able to do several hundred save/restore iterations. Without his libxl fixes, I'd be lucky to get beyond 5 iterations. http://lists.xen.org/archives/html/xen-devel/2013-01/msg01971.html Those libxl fixes will be included in the xen 4.2 branch too, and will be available in 4.2.2. Thanks! Jim

On Fri, Jan 25, 2013 at 10:21:12AM -0700, Jim Fehlig wrote:
Daniel Veillard wrote:
On Thu, Jan 24, 2013 at 12:55:09PM -0700, Jim Fehlig wrote:
Jim Fehlig wrote:
xen-unstable commit XXXX makes changes wrt modifying and deregistering timeouts.
First, timeout modify callbacks will only be invoked with an abs_t of {0,0}, i.e. make the timeout fire immediately. Prior to this commit, timeout modify callbacks were never invoked.
Second, timeout deregister hooks will no longer be called.
This patch makes changes in the libvirt libxl driver that should be compatible before and after commit XXXX.
While at it, fix a potential overflow in the timeout register callback. ---
'commit XXXX' will be replaced with a proper commit id once committed to xen-unstable.
libxl patch as been committed to xen-unstable now, changeset 26469. I've updated this patch accordingly.
Regards, Jim
From 451c84284fa1b4f311f2ceb052c8e9f25ffa0cf0 Mon Sep 17 00:00:00 2001 From: Jim Fehlig <jfehlig@suse.com> Date: Mon, 21 Jan 2013 09:59:28 -0700 Subject: [PATCH 1/6] libxl: Fix handling of timeouts
xen-unstable changeset 26469 makes changes wrt modifying and deregistering timeouts.
First, timeout modify callbacks will only be invoked with an abs_t of {0,0}, i.e. make the timeout fire immediately. Prior to this commit, timeout modify callbacks were never invoked.
Second, timeout deregister hooks will no longer be called.
This patch makes changes in the libvirt libxl driver that should be compatible before and after changeset 26469.
While at it, fix a potential overflow in the timeout register callback. --- src/libxl/libxl_driver.c | 39 ++++++++++++++++++++++++++++++--------- 1 file changed, 30 insertions(+), 9 deletions(-)
diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c index a8c4cae..77026fc 100644 --- a/src/libxl/libxl_driver.c +++ b/src/libxl/libxl_driver.c @@ -186,7 +186,13 @@ libxlTimerCallback(int timer ATTRIBUTE_UNUSED, void *timer_info) { struct libxlOSEventHookTimerInfo *info = timer_info;
+ /* libxl expects the event to be deregistered when calling + libxl_osevent_occurred_timeout, but we dont want the event info + destroyed. Disable the timeout and only remove it after returning + from libxl. */ + virEventUpdateTimeout(info->id, -1); libxl_osevent_occurred_timeout(info->priv->ctx, info->xl_priv); + virEventRemoveTimeout(info->id); }
static void @@ -202,6 +208,8 @@ libxlTimeoutRegisterEventHook(void *priv, void *xl_priv) { struct timeval now; + struct timeval res; + static struct timeval zero; struct libxlOSEventHookTimerInfo *timer_info; int timeout, timer_id;
@@ -211,8 +219,15 @@ libxlTimeoutRegisterEventHook(void *priv, }
gettimeofday(&now, NULL); - timeout = (abs_t.tv_usec - now.tv_usec) / 1000; - timeout += (abs_t.tv_sec - now.tv_sec) * 1000; + timersub(&abs_t, &now, &res); + /* Ensure timeout is not overflowed */ + if (timercmp(&res, &zero, <)) { + timeout = 0; + } else if (res.tv_sec > INT_MAX / 1000) { + timeout = INT_MAX; + } else { + timeout = res.tv_sec * 1000 + (res.tv_usec + 999) / 1000; + } timer_id = virEventAddTimeout(timeout, libxlTimerCallback, timer_info, libxlTimerInfoFree); if (timer_id < 0) { @@ -227,19 +242,25 @@ libxlTimeoutRegisterEventHook(void *priv, return 0; }
+/* + * Note: There are two changes wrt timeouts starting with xen-unstable + * changeset 26469: + * + * 1. Timeout modify callbacks will only be invoked with an abs_t of {0,0}, + * i.e. make the timeout fire immediately. Prior to this commit, timeout + * modify callbacks were never invoked. + * + * 2. Timeout deregister hooks will no longer be called. + */ static int libxlTimeoutModifyEventHook(void *priv ATTRIBUTE_UNUSED, void **hndp, - struct timeval abs_t) + struct timeval abs_t ATTRIBUTE_UNUSED) { - struct timeval now; - int timeout; struct libxlOSEventHookTimerInfo *timer_info = *hndp;
- gettimeofday(&now, NULL); - timeout = (abs_t.tv_usec - now.tv_usec) / 1000; - timeout += (abs_t.tv_sec - now.tv_sec) * 1000; - virEventUpdateTimeout(timer_info->id, timeout); + /* Make the timeout fire */ + virEventUpdateTimeout(timer_info->id, 0); return 0; }
ACK, I assume you tested with the older libxl, right ?
Yes, but with limited success due to the issue described here
http://lists.xen.org/archives/html/xen-devel/2012-11/msg01016.html
With Ian Jackson's libxl fixes, I've been able to do several hundred save/restore iterations. Without his libxl fixes, I'd be lucky to get beyond 5 iterations.
http://lists.xen.org/archives/html/xen-devel/2013-01/msg01971.html
Those libxl fixes will be included in the xen 4.2 branch too, and will be available in 4.2.2.
Okay, please push the set :-) thanks ! Daniel -- Daniel Veillard | Open Source and Standards, Red Hat veillard@redhat.com | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | virtualization library http://libvirt.org/

Daniel Veillard wrote: [...]
On Fri, Jan 25, 2013 at 10:21:12AM -0700, Jim Fehlig wrote:
Daniel Veillard wrote:
ACK, I assume you tested with the older libxl, right ?
Yes, but with limited success due to the issue described here
http://lists.xen.org/archives/html/xen-devel/2012-11/msg01016.html
With Ian Jackson's libxl fixes, I've been able to do several hundred save/restore iterations. Without his libxl fixes, I'd be lucky to get beyond 5 iterations.
http://lists.xen.org/archives/html/xen-devel/2013-01/msg01971.html
Those libxl fixes will be included in the xen 4.2 branch too, and will be available in 4.2.2.
Okay, please push the set :-)
I've fixed up the comments as you suggested and pushed the series. Thanks for the review! Regards, Jim

The libxl driver is racy in it's interactions with libxl and libvirt's event loop. The event loop can invoke callbacks after libxl has deregistered the event, and possibly access freed data associated with the event. This patch fixes the race by converting libxlDomainObjPrivate to a virObjectLockable, and locking it while executing libxl upcalls and libvirt event loop callbacks. Note that using the virDomainObj lock is not satisfactory since it may be desirable to hold the virDomainObj lock even when libxl events such as reading and writing to xenstore need processed. --- src/libxl/libxl_conf.h | 3 + src/libxl/libxl_driver.c | 140 ++++++++++++++++++++++++++++------------------- 2 files changed, 87 insertions(+), 56 deletions(-) diff --git a/src/libxl/libxl_conf.h b/src/libxl/libxl_conf.h index a3cce08..3d5a461 100644 --- a/src/libxl/libxl_conf.h +++ b/src/libxl/libxl_conf.h @@ -35,6 +35,7 @@ # include "capabilities.h" # include "configmake.h" # include "virportallocator.h" +# include "virobject.h" # define LIBXL_VNC_PORT_MIN 5900 @@ -81,6 +82,8 @@ struct _libxlDriverPrivate { typedef struct _libxlDomainObjPrivate libxlDomainObjPrivate; typedef libxlDomainObjPrivate *libxlDomainObjPrivatePtr; struct _libxlDomainObjPrivate { + virObjectLockable parent; + /* per domain libxl ctx */ libxl_ctx *ctx; libxl_evgen_domain_death *deathW; diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c index d4f38e3..81c04ed 100644 --- a/src/libxl/libxl_driver.c +++ b/src/libxl/libxl_driver.c @@ -59,18 +59,16 @@ /* Number of Xen scheduler parameters */ #define XEN_SCHED_CREDIT_NPARAM 2 -struct libxlOSEventHookFDInfo { - libxlDomainObjPrivatePtr priv; - void *xl_priv; - int watch; -}; - -struct libxlOSEventHookTimerInfo { +/* Object used to store info related to libxl event registrations */ +typedef struct _libxlEventHookInfo libxlEventHookInfo; +typedef libxlEventHookInfo *libxlEventHookInfoPtr; +struct _libxlEventHookInfo { libxlDomainObjPrivatePtr priv; void *xl_priv; int id; }; +static virClassPtr libxlDomainObjPrivateClass; static libxlDriverPrivatePtr libxl_driver = NULL; /* Function declarations */ @@ -84,6 +82,20 @@ libxlVmStart(libxlDriverPrivatePtr driver, virDomainObjPtr vm, bool start_paused, int restore_fd); /* Function definitions */ +static int +libxlDomainObjPrivateOnceInit(void) +{ + if (!(libxlDomainObjPrivateClass = virClassNew(virClassForObjectLockable(), + "libxlDomainObjPrivate", + sizeof(libxlDomainObjPrivate), + NULL))) + return -1; + + return 0; +} + +VIR_ONCE_GLOBAL_INIT(libxlDomainObjPrivate) + static void libxlDriverLock(libxlDriverPrivatePtr driver) { @@ -97,14 +109,21 @@ libxlDriverUnlock(libxlDriverPrivatePtr driver) } static void +libxlEventHookInfoFree(void *obj) +{ + VIR_FREE(obj); +} + +static void libxlFDEventCallback(int watch ATTRIBUTE_UNUSED, int fd, int vir_events, void *fd_info) { - struct libxlOSEventHookFDInfo *info = fd_info; + libxlEventHookInfoPtr info = fd_info; int events = 0; + virObjectLock(info->priv); if (vir_events & VIR_EVENT_HANDLE_READABLE) events |= POLLIN; if (vir_events & VIR_EVENT_HANDLE_WRITABLE) @@ -114,42 +133,37 @@ libxlFDEventCallback(int watch ATTRIBUTE_UNUSED, if (vir_events & VIR_EVENT_HANDLE_HANGUP) events |= POLLHUP; + virObjectUnlock(info->priv); libxl_osevent_occurred_fd(info->priv->ctx, info->xl_priv, fd, 0, events); } -static void -libxlFreeFDInfo(void *obj) -{ - VIR_FREE(obj); -} - static int libxlFDRegisterEventHook(void *priv, int fd, void **hndp, short events, void *xl_priv) { int vir_events = VIR_EVENT_HANDLE_ERROR; - struct libxlOSEventHookFDInfo *fdinfo; + libxlEventHookInfoPtr info; - if (VIR_ALLOC(fdinfo) < 0) { + if (VIR_ALLOC(info) < 0) { virReportOOMError(); return -1; } - fdinfo->priv = priv; - fdinfo->xl_priv = xl_priv; - *hndp = fdinfo; - if (events & POLLIN) vir_events |= VIR_EVENT_HANDLE_READABLE; if (events & POLLOUT) vir_events |= VIR_EVENT_HANDLE_WRITABLE; - fdinfo->watch = virEventAddHandle(fd, vir_events, libxlFDEventCallback, - fdinfo, libxlFreeFDInfo); - if (fdinfo->watch < 0) { - VIR_FREE(fdinfo); - return fdinfo->watch; + info->id = virEventAddHandle(fd, vir_events, libxlFDEventCallback, + info, libxlEventHookInfoFree); + if (info->id < 0) { + VIR_FREE(info); + return -1; } + info->priv = priv; + info->xl_priv = xl_priv; + *hndp = info; + return 0; } @@ -159,15 +173,18 @@ libxlFDModifyEventHook(void *priv ATTRIBUTE_UNUSED, void **hndp, short events) { - struct libxlOSEventHookFDInfo *fdinfo = *hndp; + libxlEventHookInfoPtr info = *hndp; int vir_events = VIR_EVENT_HANDLE_ERROR; + virObjectLock(info->priv); if (events & POLLIN) vir_events |= VIR_EVENT_HANDLE_READABLE; if (events & POLLOUT) vir_events |= VIR_EVENT_HANDLE_WRITABLE; - virEventUpdateHandle(fdinfo->watch, vir_events); + virEventUpdateHandle(info->id, vir_events); + virObjectUnlock(info->priv); + return 0; } @@ -176,29 +193,31 @@ libxlFDDeregisterEventHook(void *priv ATTRIBUTE_UNUSED, int fd ATTRIBUTE_UNUSED, void *hnd) { - struct libxlOSEventHookFDInfo *fdinfo = hnd; + libxlEventHookInfoPtr info = hnd; + libxlDomainObjPrivatePtr p = info->priv; - virEventRemoveHandle(fdinfo->watch); + virObjectLock(p); + virEventRemoveHandle(info->id); + virObjectUnlock(p); } static void libxlTimerCallback(int timer ATTRIBUTE_UNUSED, void *timer_info) { - struct libxlOSEventHookTimerInfo *info = timer_info; + libxlEventHookInfoPtr info = timer_info; + libxlDomainObjPrivatePtr p = info->priv; + virObjectLock(p); /* libxl expects the event to be deregistered when calling libxl_osevent_occurred_timeout, but we dont want the event info destroyed. Disable the timeout and only remove it after returning from libxl. */ virEventUpdateTimeout(info->id, -1); - libxl_osevent_occurred_timeout(info->priv->ctx, info->xl_priv); + virObjectUnlock(p); + libxl_osevent_occurred_timeout(p->ctx, info->xl_priv); + virObjectLock(p); virEventRemoveTimeout(info->id); -} - -static void -libxlTimerInfoFree(void* obj) -{ - VIR_FREE(obj); + virObjectUnlock(p); } static int @@ -207,13 +226,13 @@ libxlTimeoutRegisterEventHook(void *priv, struct timeval abs_t, void *xl_priv) { + libxlEventHookInfoPtr info; struct timeval now; struct timeval res; static struct timeval zero; - struct libxlOSEventHookTimerInfo *timer_info; - int timeout, timer_id; + int timeout; - if (VIR_ALLOC(timer_info) < 0) { + if (VIR_ALLOC(info) < 0) { virReportOOMError(); return -1; } @@ -228,17 +247,17 @@ libxlTimeoutRegisterEventHook(void *priv, } else { timeout = res.tv_sec * 1000 + (res.tv_usec + 999) / 1000; } - timer_id = virEventAddTimeout(timeout, libxlTimerCallback, - timer_info, libxlTimerInfoFree); - if (timer_id < 0) { - VIR_FREE(timer_info); - return timer_id; + info->id = virEventAddTimeout(timeout, libxlTimerCallback, + info, libxlEventHookInfoFree); + if (info->id < 0) { + VIR_FREE(info); + return -1; } - timer_info->priv = priv; - timer_info->xl_priv = xl_priv; - timer_info->id = timer_id; - *hndp = timer_info; + info->priv = priv; + info->xl_priv = xl_priv; + *hndp = info; + return 0; } @@ -257,10 +276,13 @@ libxlTimeoutModifyEventHook(void *priv ATTRIBUTE_UNUSED, void **hndp, struct timeval abs_t ATTRIBUTE_UNUSED) { - struct libxlOSEventHookTimerInfo *timer_info = *hndp; + libxlEventHookInfoPtr info = *hndp; + virObjectLock(info->priv); /* Make the timeout fire */ - virEventUpdateTimeout(timer_info->id, 0); + virEventUpdateTimeout(info->id, 0); + virObjectUnlock(info->priv); + return 0; } @@ -268,9 +290,12 @@ static void libxlTimeoutDeregisterEventHook(void *priv ATTRIBUTE_UNUSED, void *hnd) { - struct libxlOSEventHookTimerInfo *timer_info = hnd; + libxlEventHookInfoPtr info = hnd; + libxlDomainObjPrivatePtr p = info->priv; - virEventRemoveTimeout(timer_info->id); + virObjectLock(p); + virEventRemoveTimeout(info->id); + virObjectUnlock(p); } static const libxl_osevent_hooks libxl_event_callbacks = { @@ -287,12 +312,15 @@ libxlDomainObjPrivateAlloc(void) { libxlDomainObjPrivatePtr priv; - if (VIR_ALLOC(priv) < 0) + if (libxlDomainObjPrivateInitialize() < 0) + return NULL; + + if (!(priv = virObjectLockableNew(libxlDomainObjPrivateClass))) return NULL; if (libxl_ctx_alloc(&priv->ctx, LIBXL_VERSION, 0, libxl_driver->logger)) { VIR_ERROR(_("Failed libxl context initialization")); - VIR_FREE(priv); + virObjectUnref(priv); return NULL; } @@ -310,7 +338,7 @@ libxlDomainObjPrivateFree(void *data) libxl_evdisable_domain_death(priv->ctx, priv->deathW); libxl_ctx_free(priv->ctx); - VIR_FREE(priv); + virObjectUnref(priv); } /* driver must be locked before calling */ -- 1.8.0.1

On Mon, Jan 21, 2013 at 12:22:20PM -0700, Jim Fehlig wrote:
The libxl driver is racy in it's interactions with libxl and libvirt's event loop. The event loop can invoke callbacks after libxl has deregistered the event, and possibly access freed data associated with the event.
This patch fixes the race by converting libxlDomainObjPrivate to a virObjectLockable, and locking it while executing libxl upcalls and libvirt event loop callbacks.
Note that using the virDomainObj lock is not satisfactory since it may be desirable to hold the virDomainObj lock even when libxl events such as reading and writing to xenstore need processed. --- src/libxl/libxl_conf.h | 3 + src/libxl/libxl_driver.c | 140 ++++++++++++++++++++++++++++------------------- 2 files changed, 87 insertions(+), 56 deletions(-)
diff --git a/src/libxl/libxl_conf.h b/src/libxl/libxl_conf.h index a3cce08..3d5a461 100644 --- a/src/libxl/libxl_conf.h +++ b/src/libxl/libxl_conf.h @@ -35,6 +35,7 @@ # include "capabilities.h" # include "configmake.h" # include "virportallocator.h" +# include "virobject.h"
# define LIBXL_VNC_PORT_MIN 5900 @@ -81,6 +82,8 @@ struct _libxlDriverPrivate { typedef struct _libxlDomainObjPrivate libxlDomainObjPrivate; typedef libxlDomainObjPrivate *libxlDomainObjPrivatePtr; struct _libxlDomainObjPrivate { + virObjectLockable parent; + /* per domain libxl ctx */ libxl_ctx *ctx; libxl_evgen_domain_death *deathW; diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c index d4f38e3..81c04ed 100644 --- a/src/libxl/libxl_driver.c +++ b/src/libxl/libxl_driver.c @@ -59,18 +59,16 @@ /* Number of Xen scheduler parameters */ #define XEN_SCHED_CREDIT_NPARAM 2
-struct libxlOSEventHookFDInfo { - libxlDomainObjPrivatePtr priv; - void *xl_priv; - int watch; -}; - -struct libxlOSEventHookTimerInfo { +/* Object used to store info related to libxl event registrations */ +typedef struct _libxlEventHookInfo libxlEventHookInfo; +typedef libxlEventHookInfo *libxlEventHookInfoPtr; +struct _libxlEventHookInfo { libxlDomainObjPrivatePtr priv; void *xl_priv; int id; };
+static virClassPtr libxlDomainObjPrivateClass; static libxlDriverPrivatePtr libxl_driver = NULL;
/* Function declarations */ @@ -84,6 +82,20 @@ libxlVmStart(libxlDriverPrivatePtr driver, virDomainObjPtr vm, bool start_paused, int restore_fd);
/* Function definitions */ +static int +libxlDomainObjPrivateOnceInit(void) +{ + if (!(libxlDomainObjPrivateClass = virClassNew(virClassForObjectLockable(), + "libxlDomainObjPrivate", + sizeof(libxlDomainObjPrivate), + NULL))) + return -1; + + return 0; +} + +VIR_ONCE_GLOBAL_INIT(libxlDomainObjPrivate) + static void libxlDriverLock(libxlDriverPrivatePtr driver) { @@ -97,14 +109,21 @@ libxlDriverUnlock(libxlDriverPrivatePtr driver) }
static void +libxlEventHookInfoFree(void *obj) +{ + VIR_FREE(obj); +} + +static void libxlFDEventCallback(int watch ATTRIBUTE_UNUSED, int fd, int vir_events, void *fd_info) { - struct libxlOSEventHookFDInfo *info = fd_info; + libxlEventHookInfoPtr info = fd_info; int events = 0;
+ virObjectLock(info->priv); if (vir_events & VIR_EVENT_HANDLE_READABLE) events |= POLLIN; if (vir_events & VIR_EVENT_HANDLE_WRITABLE) @@ -114,42 +133,37 @@ libxlFDEventCallback(int watch ATTRIBUTE_UNUSED, if (vir_events & VIR_EVENT_HANDLE_HANGUP) events |= POLLHUP;
+ virObjectUnlock(info->priv); libxl_osevent_occurred_fd(info->priv->ctx, info->xl_priv, fd, 0, events); }
-static void -libxlFreeFDInfo(void *obj) -{ - VIR_FREE(obj); -} - static int libxlFDRegisterEventHook(void *priv, int fd, void **hndp, short events, void *xl_priv) { int vir_events = VIR_EVENT_HANDLE_ERROR; - struct libxlOSEventHookFDInfo *fdinfo; + libxlEventHookInfoPtr info;
- if (VIR_ALLOC(fdinfo) < 0) { + if (VIR_ALLOC(info) < 0) { virReportOOMError(); return -1; }
- fdinfo->priv = priv; - fdinfo->xl_priv = xl_priv; - *hndp = fdinfo; - if (events & POLLIN) vir_events |= VIR_EVENT_HANDLE_READABLE; if (events & POLLOUT) vir_events |= VIR_EVENT_HANDLE_WRITABLE; - fdinfo->watch = virEventAddHandle(fd, vir_events, libxlFDEventCallback, - fdinfo, libxlFreeFDInfo); - if (fdinfo->watch < 0) { - VIR_FREE(fdinfo); - return fdinfo->watch; + info->id = virEventAddHandle(fd, vir_events, libxlFDEventCallback, + info, libxlEventHookInfoFree); + if (info->id < 0) { + VIR_FREE(info); + return -1; }
+ info->priv = priv; + info->xl_priv = xl_priv; + *hndp = info; + return 0; }
@@ -159,15 +173,18 @@ libxlFDModifyEventHook(void *priv ATTRIBUTE_UNUSED, void **hndp, short events) { - struct libxlOSEventHookFDInfo *fdinfo = *hndp; + libxlEventHookInfoPtr info = *hndp; int vir_events = VIR_EVENT_HANDLE_ERROR;
+ virObjectLock(info->priv); if (events & POLLIN) vir_events |= VIR_EVENT_HANDLE_READABLE; if (events & POLLOUT) vir_events |= VIR_EVENT_HANDLE_WRITABLE;
- virEventUpdateHandle(fdinfo->watch, vir_events); + virEventUpdateHandle(info->id, vir_events); + virObjectUnlock(info->priv); + return 0; }
@@ -176,29 +193,31 @@ libxlFDDeregisterEventHook(void *priv ATTRIBUTE_UNUSED, int fd ATTRIBUTE_UNUSED, void *hnd) { - struct libxlOSEventHookFDInfo *fdinfo = hnd; + libxlEventHookInfoPtr info = hnd; + libxlDomainObjPrivatePtr p = info->priv;
- virEventRemoveHandle(fdinfo->watch); + virObjectLock(p); + virEventRemoveHandle(info->id); + virObjectUnlock(p); }
static void libxlTimerCallback(int timer ATTRIBUTE_UNUSED, void *timer_info) { - struct libxlOSEventHookTimerInfo *info = timer_info; + libxlEventHookInfoPtr info = timer_info; + libxlDomainObjPrivatePtr p = info->priv;
+ virObjectLock(p); /* libxl expects the event to be deregistered when calling libxl_osevent_occurred_timeout, but we dont want the event info destroyed. Disable the timeout and only remove it after returning from libxl. */ virEventUpdateTimeout(info->id, -1); - libxl_osevent_occurred_timeout(info->priv->ctx, info->xl_priv); + virObjectUnlock(p); + libxl_osevent_occurred_timeout(p->ctx, info->xl_priv); + virObjectLock(p); virEventRemoveTimeout(info->id); -} - -static void -libxlTimerInfoFree(void* obj) -{ - VIR_FREE(obj); + virObjectUnlock(p); }
static int @@ -207,13 +226,13 @@ libxlTimeoutRegisterEventHook(void *priv, struct timeval abs_t, void *xl_priv) { + libxlEventHookInfoPtr info; struct timeval now; struct timeval res; static struct timeval zero; - struct libxlOSEventHookTimerInfo *timer_info; - int timeout, timer_id; + int timeout;
- if (VIR_ALLOC(timer_info) < 0) { + if (VIR_ALLOC(info) < 0) { virReportOOMError(); return -1; } @@ -228,17 +247,17 @@ libxlTimeoutRegisterEventHook(void *priv, } else { timeout = res.tv_sec * 1000 + (res.tv_usec + 999) / 1000; } - timer_id = virEventAddTimeout(timeout, libxlTimerCallback, - timer_info, libxlTimerInfoFree); - if (timer_id < 0) { - VIR_FREE(timer_info); - return timer_id; + info->id = virEventAddTimeout(timeout, libxlTimerCallback, + info, libxlEventHookInfoFree); + if (info->id < 0) { + VIR_FREE(info); + return -1; }
- timer_info->priv = priv; - timer_info->xl_priv = xl_priv; - timer_info->id = timer_id; - *hndp = timer_info; + info->priv = priv; + info->xl_priv = xl_priv; + *hndp = info; + return 0; }
@@ -257,10 +276,13 @@ libxlTimeoutModifyEventHook(void *priv ATTRIBUTE_UNUSED, void **hndp, struct timeval abs_t ATTRIBUTE_UNUSED) { - struct libxlOSEventHookTimerInfo *timer_info = *hndp; + libxlEventHookInfoPtr info = *hndp;
+ virObjectLock(info->priv); /* Make the timeout fire */ - virEventUpdateTimeout(timer_info->id, 0); + virEventUpdateTimeout(info->id, 0); + virObjectUnlock(info->priv); + return 0; }
@@ -268,9 +290,12 @@ static void libxlTimeoutDeregisterEventHook(void *priv ATTRIBUTE_UNUSED, void *hnd) { - struct libxlOSEventHookTimerInfo *timer_info = hnd; + libxlEventHookInfoPtr info = hnd; + libxlDomainObjPrivatePtr p = info->priv;
- virEventRemoveTimeout(timer_info->id); + virObjectLock(p); + virEventRemoveTimeout(info->id); + virObjectUnlock(p); }
static const libxl_osevent_hooks libxl_event_callbacks = { @@ -287,12 +312,15 @@ libxlDomainObjPrivateAlloc(void) { libxlDomainObjPrivatePtr priv;
- if (VIR_ALLOC(priv) < 0) + if (libxlDomainObjPrivateInitialize() < 0) + return NULL; + + if (!(priv = virObjectLockableNew(libxlDomainObjPrivateClass))) return NULL;
if (libxl_ctx_alloc(&priv->ctx, LIBXL_VERSION, 0, libxl_driver->logger)) { VIR_ERROR(_("Failed libxl context initialization")); - VIR_FREE(priv); + virObjectUnref(priv); return NULL; }
@@ -310,7 +338,7 @@ libxlDomainObjPrivateFree(void *data) libxl_evdisable_domain_death(priv->ctx, priv->deathW);
libxl_ctx_free(priv->ctx); - VIR_FREE(priv); + virObjectUnref(priv); }
/* driver must be locked before calling */
ACK, adding a new lock allows to synchronize access, fine, i just hope there is no risk of deadlock, we will see :-) Daniel -- Daniel Veillard | Open Source and Standards, Red Hat veillard@redhat.com | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | virtualization library http://libvirt.org/

It is possible to destroy and cleanup a VM, resulting in freeing the libxlDomainObjPrivate object and associated libxl ctx, before all fds and timeouts have been deregistered and destroyed. Fix this race by incrementing the reference count on libxlDomainObjPrivate for each fd and timeout registration. Only when all fds and timeouts are deregistered and destroyed will the libxlDomainObjPrivate be destroyed. --- src/libxl/libxl_driver.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c index 81c04ed..530a17f 100644 --- a/src/libxl/libxl_driver.c +++ b/src/libxl/libxl_driver.c @@ -111,7 +111,11 @@ libxlDriverUnlock(libxlDriverPrivatePtr driver) static void libxlEventHookInfoFree(void *obj) { - VIR_FREE(obj); + libxlEventHookInfoPtr info = obj; + + /* Drop reference on libxlDomainObjPrivate */ + virObjectUnref(info->priv); + VIR_FREE(info); } static void @@ -161,6 +165,11 @@ libxlFDRegisterEventHook(void *priv, int fd, void **hndp, } info->priv = priv; + /* Take a reference on the domain object. Reference is dropped in + libxlEventHookInfoFree, ensuring the domain object outlives the fd + event objects. */ + virObjectRef(info->priv); + info->xl_priv = xl_priv; *hndp = info; @@ -255,6 +264,11 @@ libxlTimeoutRegisterEventHook(void *priv, } info->priv = priv; + /* Also take a reference on the domain object. Reference is dropped in + libxlEventHookInfoFree, ensuring the domain object outlives the timeout + event objects. */ + virObjectRef(info->priv); + info->xl_priv = xl_priv; *hndp = info; -- 1.8.0.1

On Mon, Jan 21, 2013 at 12:22:21PM -0700, Jim Fehlig wrote:
It is possible to destroy and cleanup a VM, resulting in freeing the libxlDomainObjPrivate object and associated libxl ctx, before all fds and timeouts have been deregistered and destroyed.
Fix this race by incrementing the reference count on libxlDomainObjPrivate for each fd and timeout registration. Only when all fds and timeouts are deregistered and destroyed will the libxlDomainObjPrivate be destroyed. --- src/libxl/libxl_driver.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c index 81c04ed..530a17f 100644 --- a/src/libxl/libxl_driver.c +++ b/src/libxl/libxl_driver.c @@ -111,7 +111,11 @@ libxlDriverUnlock(libxlDriverPrivatePtr driver) static void libxlEventHookInfoFree(void *obj) { - VIR_FREE(obj); + libxlEventHookInfoPtr info = obj; + + /* Drop reference on libxlDomainObjPrivate */ + virObjectUnref(info->priv); + VIR_FREE(info); }
static void @@ -161,6 +165,11 @@ libxlFDRegisterEventHook(void *priv, int fd, void **hndp, }
info->priv = priv; + /* Take a reference on the domain object. Reference is dropped in + libxlEventHookInfoFree, ensuring the domain object outlives the fd + event objects. */ + virObjectRef(info->priv); + info->xl_priv = xl_priv; *hndp = info;
@@ -255,6 +264,11 @@ libxlTimeoutRegisterEventHook(void *priv, }
info->priv = priv; + /* Also take a reference on the domain object. Reference is dropped in + libxlEventHookInfoFree, ensuring the domain object outlives the timeout + event objects. */ + virObjectRef(info->priv); + info->xl_priv = xl_priv; *hndp = info;
Sounds good, ACK Daniel -- Daniel Veillard | Open Source and Standards, Red Hat veillard@redhat.com | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | virtualization library http://libvirt.org/

I've noticed that libxl can invoke timeout reregister/modify hooks after returning from libxl_ctx_free. Explicitly remove the timeouts before freeing the libxl ctx to avoid executing hooks on stale objects. --- src/libxl/libxl_conf.h | 6 ++++ src/libxl/libxl_driver.c | 71 +++++++++++++++++++++++++++++++++++++++++++++--- 2 files changed, 73 insertions(+), 4 deletions(-) diff --git a/src/libxl/libxl_conf.h b/src/libxl/libxl_conf.h index 3d5a461..f6167e6 100644 --- a/src/libxl/libxl_conf.h +++ b/src/libxl/libxl_conf.h @@ -79,6 +79,9 @@ struct _libxlDriverPrivate { char *saveDir; }; +typedef struct _libxlEventHookInfo libxlEventHookInfo; +typedef libxlEventHookInfo *libxlEventHookInfoPtr; + typedef struct _libxlDomainObjPrivate libxlDomainObjPrivate; typedef libxlDomainObjPrivate *libxlDomainObjPrivatePtr; struct _libxlDomainObjPrivate { @@ -87,6 +90,9 @@ struct _libxlDomainObjPrivate { /* per domain libxl ctx */ libxl_ctx *ctx; libxl_evgen_domain_death *deathW; + + /* list of libxl timeout registrations */ + libxlEventHookInfoPtr timerRegistrations; }; # define LIBXL_SAVE_MAGIC "libvirt-xml\n \0 \r" diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c index 530a17f..08dffd6 100644 --- a/src/libxl/libxl_driver.c +++ b/src/libxl/libxl_driver.c @@ -59,10 +59,39 @@ /* Number of Xen scheduler parameters */ #define XEN_SCHED_CREDIT_NPARAM 2 +/* Append an event registration to the list of registrations */ +#define LIBXL_EV_REG_APPEND(head, add) \ + do { \ + libxlEventHookInfoPtr temp; \ + if (head) { \ + temp = head; \ + while (temp->next) \ + temp = temp->next; \ + temp->next = add; \ + } else { \ + head = add; \ + } \ + } while (0) + +/* Remove an event registration from the list of registrations */ +#define LIBXL_EV_REG_REMOVE(head, del) \ + do { \ + libxlEventHookInfoPtr temp; \ + if (head == del) { \ + head = head->next; \ + } else { \ + temp = head; \ + while (temp->next && temp->next != del) \ + temp = temp->next; \ + if (temp->next) { \ + temp->next = del->next; \ + } \ + } \ + } while (0) + /* Object used to store info related to libxl event registrations */ -typedef struct _libxlEventHookInfo libxlEventHookInfo; -typedef libxlEventHookInfo *libxlEventHookInfoPtr; struct _libxlEventHookInfo { + libxlEventHookInfoPtr next; libxlDomainObjPrivatePtr priv; void *xl_priv; int id; @@ -225,7 +254,11 @@ libxlTimerCallback(int timer ATTRIBUTE_UNUSED, void *timer_info) virObjectUnlock(p); libxl_osevent_occurred_timeout(p->ctx, info->xl_priv); virObjectLock(p); - virEventRemoveTimeout(info->id); + /* timeout could have been freed while the lock was dropped. + Only remove it from the list if it still exists. + */ + if (virEventRemoveTimeout(info->id) == 0) + LIBXL_EV_REG_REMOVE(p->timerRegistrations, info); virObjectUnlock(p); } @@ -269,6 +302,9 @@ libxlTimeoutRegisterEventHook(void *priv, event objects. */ virObjectRef(info->priv); + virObjectLock(info->priv); + LIBXL_EV_REG_APPEND(info->priv->timerRegistrations, info); + virObjectUnlock(info->priv); info->xl_priv = xl_priv; *hndp = info; @@ -308,10 +344,35 @@ libxlTimeoutDeregisterEventHook(void *priv ATTRIBUTE_UNUSED, libxlDomainObjPrivatePtr p = info->priv; virObjectLock(p); - virEventRemoveTimeout(info->id); + /* Only remove the timeout from the list if removal from the + event loop is successful. + */ + if (virEventRemoveTimeout(info->id) == 0) + LIBXL_EV_REG_REMOVE(p->timerRegistrations, info); virObjectUnlock(p); } +static void +libxlRegisteredTimeoutsCleanup(libxlDomainObjPrivatePtr priv) +{ + libxlEventHookInfoPtr info; + + virObjectLock(priv); + info = priv->timerRegistrations; + while (info) { + /* libxl expects the event to be deregistered when calling + libxl_osevent_occurred_timeout, but we dont want the event info + destroyed. Disable the timeout and only remove it after returning + from libxl. */ + virEventUpdateTimeout(info->id, -1); + libxl_osevent_occurred_timeout(priv->ctx, info->xl_priv); + virEventRemoveTimeout(info->id); + info = info->next; + } + priv->timerRegistrations = NULL; + virObjectUnlock(priv); +} + static const libxl_osevent_hooks libxl_event_callbacks = { .fd_register = libxlFDRegisterEventHook, .fd_modify = libxlFDModifyEventHook, @@ -565,6 +626,8 @@ libxlVmCleanup(libxlDriverPrivatePtr driver, vm->def->id = -1; vm->newDef = NULL; } + + libxlRegisteredTimeoutsCleanup(priv); } /* -- 1.8.0.1

On Mon, Jan 21, 2013 at 12:22:22PM -0700, Jim Fehlig wrote:
I've noticed that libxl can invoke timeout reregister/modify hooks after returning from libxl_ctx_free. Explicitly remove the timeouts before freeing the libxl ctx to avoid executing hooks on stale objects. --- src/libxl/libxl_conf.h | 6 ++++ src/libxl/libxl_driver.c | 71 +++++++++++++++++++++++++++++++++++++++++++++--- 2 files changed, 73 insertions(+), 4 deletions(-)
diff --git a/src/libxl/libxl_conf.h b/src/libxl/libxl_conf.h index 3d5a461..f6167e6 100644 --- a/src/libxl/libxl_conf.h +++ b/src/libxl/libxl_conf.h @@ -79,6 +79,9 @@ struct _libxlDriverPrivate { char *saveDir; };
+typedef struct _libxlEventHookInfo libxlEventHookInfo; +typedef libxlEventHookInfo *libxlEventHookInfoPtr; + typedef struct _libxlDomainObjPrivate libxlDomainObjPrivate; typedef libxlDomainObjPrivate *libxlDomainObjPrivatePtr; struct _libxlDomainObjPrivate { @@ -87,6 +90,9 @@ struct _libxlDomainObjPrivate { /* per domain libxl ctx */ libxl_ctx *ctx; libxl_evgen_domain_death *deathW; + + /* list of libxl timeout registrations */ + libxlEventHookInfoPtr timerRegistrations; };
# define LIBXL_SAVE_MAGIC "libvirt-xml\n \0 \r" diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c index 530a17f..08dffd6 100644 --- a/src/libxl/libxl_driver.c +++ b/src/libxl/libxl_driver.c @@ -59,10 +59,39 @@ /* Number of Xen scheduler parameters */ #define XEN_SCHED_CREDIT_NPARAM 2
+/* Append an event registration to the list of registrations */ +#define LIBXL_EV_REG_APPEND(head, add) \ + do { \ + libxlEventHookInfoPtr temp; \ + if (head) { \ + temp = head; \ + while (temp->next) \ + temp = temp->next; \ + temp->next = add; \ + } else { \ + head = add; \ + } \ + } while (0) + +/* Remove an event registration from the list of registrations */ +#define LIBXL_EV_REG_REMOVE(head, del) \ + do { \ + libxlEventHookInfoPtr temp; \ + if (head == del) { \ + head = head->next; \ + } else { \ + temp = head; \ + while (temp->next && temp->next != del) \ + temp = temp->next; \ + if (temp->next) { \ + temp->next = del->next; \ + } \ + } \ + } while (0) + /* Object used to store info related to libxl event registrations */ -typedef struct _libxlEventHookInfo libxlEventHookInfo; -typedef libxlEventHookInfo *libxlEventHookInfoPtr; struct _libxlEventHookInfo { + libxlEventHookInfoPtr next; libxlDomainObjPrivatePtr priv; void *xl_priv; int id; @@ -225,7 +254,11 @@ libxlTimerCallback(int timer ATTRIBUTE_UNUSED, void *timer_info) virObjectUnlock(p); libxl_osevent_occurred_timeout(p->ctx, info->xl_priv); virObjectLock(p); - virEventRemoveTimeout(info->id); + /* timeout could have been freed while the lock was dropped. + Only remove it from the list if it still exists. + */ + if (virEventRemoveTimeout(info->id) == 0) + LIBXL_EV_REG_REMOVE(p->timerRegistrations, info); virObjectUnlock(p); }
@@ -269,6 +302,9 @@ libxlTimeoutRegisterEventHook(void *priv, event objects. */ virObjectRef(info->priv);
+ virObjectLock(info->priv); + LIBXL_EV_REG_APPEND(info->priv->timerRegistrations, info); + virObjectUnlock(info->priv); info->xl_priv = xl_priv; *hndp = info;
@@ -308,10 +344,35 @@ libxlTimeoutDeregisterEventHook(void *priv ATTRIBUTE_UNUSED, libxlDomainObjPrivatePtr p = info->priv;
virObjectLock(p); - virEventRemoveTimeout(info->id); + /* Only remove the timeout from the list if removal from the + event loop is successful. + */ + if (virEventRemoveTimeout(info->id) == 0) + LIBXL_EV_REG_REMOVE(p->timerRegistrations, info); virObjectUnlock(p); }
+static void +libxlRegisteredTimeoutsCleanup(libxlDomainObjPrivatePtr priv) +{ + libxlEventHookInfoPtr info; + + virObjectLock(priv); + info = priv->timerRegistrations; + while (info) { + /* libxl expects the event to be deregistered when calling + libxl_osevent_occurred_timeout, but we dont want the event info + destroyed. Disable the timeout and only remove it after returning + from libxl. */ + virEventUpdateTimeout(info->id, -1); + libxl_osevent_occurred_timeout(priv->ctx, info->xl_priv); + virEventRemoveTimeout(info->id); + info = info->next; + } + priv->timerRegistrations = NULL; + virObjectUnlock(priv); +} + static const libxl_osevent_hooks libxl_event_callbacks = { .fd_register = libxlFDRegisterEventHook, .fd_modify = libxlFDModifyEventHook, @@ -565,6 +626,8 @@ libxlVmCleanup(libxlDriverPrivatePtr driver, vm->def->id = -1; vm->newDef = NULL; } + + libxlRegisteredTimeoutsCleanup(priv); }
/*
ACK, just one small nit, usually we format multi-line comments as /* * bla * bla */ could you update your patches accordingly :-) ? thanks ! Daniel -- Daniel Veillard | Open Source and Standards, Red Hat veillard@redhat.com | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | virtualization library http://libvirt.org/

libxlDoDomainSave() was removing non-persistent domains, but required callers to have the virDomainObj locked. Callers could potentially unlock an already freed virDomainObj. Move this logic to the callers of libxlDoDomainSave(). --- src/libxl/libxl_driver.c | 26 ++++++++++++++++++-------- 1 file changed, 18 insertions(+), 8 deletions(-) diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c index 08dffd6..7484b83 100644 --- a/src/libxl/libxl_driver.c +++ b/src/libxl/libxl_driver.c @@ -2118,12 +2118,6 @@ libxlDoDomainSave(libxlDriverPrivatePtr driver, virDomainObjPtr vm, } vm->hasManagedSave = true; - - if (!vm->persistent) { - virDomainRemoveInactive(&driver->domains, vm); - vm = NULL; - } - ret = 0; cleanup: @@ -2166,7 +2160,15 @@ libxlDomainSaveFlags(virDomainPtr dom, const char *to, const char *dxml, goto cleanup; } - ret = libxlDoDomainSave(driver, vm, to); + if (libxlDoDomainSave(driver, vm, to) < 0) + goto cleanup; + + if (!vm->persistent) { + virDomainRemoveInactive(&driver->domains, vm); + vm = NULL; + } + + ret = 0; cleanup: if (vm) @@ -2365,7 +2367,15 @@ libxlDomainManagedSave(virDomainPtr dom, unsigned int flags) VIR_INFO("Saving state to %s", name); - ret = libxlDoDomainSave(driver, vm, name); + if (libxlDoDomainSave(driver, vm, name) < 0) + goto cleanup; + + if (!vm->persistent) { + virDomainRemoveInactive(&driver->domains, vm); + vm = NULL; + } + + ret = 0; cleanup: if (vm) -- 1.8.0.1

On Mon, Jan 21, 2013 at 12:22:23PM -0700, Jim Fehlig wrote:
libxlDoDomainSave() was removing non-persistent domains, but required callers to have the virDomainObj locked. Callers could potentially unlock an already freed virDomainObj. Move this logic to the callers of libxlDoDomainSave(). --- src/libxl/libxl_driver.c | 26 ++++++++++++++++++-------- 1 file changed, 18 insertions(+), 8 deletions(-)
diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c index 08dffd6..7484b83 100644 --- a/src/libxl/libxl_driver.c +++ b/src/libxl/libxl_driver.c @@ -2118,12 +2118,6 @@ libxlDoDomainSave(libxlDriverPrivatePtr driver, virDomainObjPtr vm, }
vm->hasManagedSave = true; - - if (!vm->persistent) { - virDomainRemoveInactive(&driver->domains, vm); - vm = NULL; - } - ret = 0;
cleanup: @@ -2166,7 +2160,15 @@ libxlDomainSaveFlags(virDomainPtr dom, const char *to, const char *dxml, goto cleanup; }
- ret = libxlDoDomainSave(driver, vm, to); + if (libxlDoDomainSave(driver, vm, to) < 0) + goto cleanup; + + if (!vm->persistent) { + virDomainRemoveInactive(&driver->domains, vm); + vm = NULL; + } + + ret = 0;
cleanup: if (vm) @@ -2365,7 +2367,15 @@ libxlDomainManagedSave(virDomainPtr dom, unsigned int flags)
VIR_INFO("Saving state to %s", name);
- ret = libxlDoDomainSave(driver, vm, name); + if (libxlDoDomainSave(driver, vm, name) < 0) + goto cleanup; + + if (!vm->persistent) { + virDomainRemoveInactive(&driver->domains, vm); + vm = NULL; + } + + ret = 0;
cleanup: if (vm)
ACK, Daniel -- Daniel Veillard | Open Source and Standards, Red Hat veillard@redhat.com | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | virtualization library http://libvirt.org/

Since libxl provides the domain ID in the event handler callback, find the domain object based on the ID. This approach prevents processing the callback on a domain that has already been reaped. --- src/libxl/libxl_driver.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c index 7484b83..e28b641 100644 --- a/src/libxl/libxl_driver.c +++ b/src/libxl/libxl_driver.c @@ -656,22 +656,22 @@ libxlVmReap(libxlDriverPrivatePtr driver, * Handle previously registered event notification from libxenlight */ static void -libxlEventHandler(void *data, const libxl_event *event) +libxlEventHandler(void *data ATTRIBUTE_UNUSED, const libxl_event *event) { libxlDriverPrivatePtr driver = libxl_driver; - virDomainObjPtr vm = data; + virDomainObjPtr vm = NULL; virDomainEventPtr dom_event = NULL; libxlDriverLock(driver); - virObjectLock(vm); + vm = virDomainFindByID(&driver->domains, event->domid); libxlDriverUnlock(driver); + if (!vm) + goto cleanup; + if (event->type == LIBXL_EVENT_TYPE_DOMAIN_SHUTDOWN) { virDomainShutoffReason reason; - if (event->domid != vm->def->id) - goto cleanup; - switch (event->u.domain_shutdown.shutdown_reason) { case LIBXL_SHUTDOWN_REASON_POWEROFF: case LIBXL_SHUTDOWN_REASON_CRASH: -- 1.8.0.1

Jim Fehlig wrote:
Since libxl provides the domain ID in the event handler callback, find the domain object based on the ID. This approach prevents processing the callback on a domain that has already been reaped. --- src/libxl/libxl_driver.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c index 7484b83..e28b641 100644 --- a/src/libxl/libxl_driver.c +++ b/src/libxl/libxl_driver.c @@ -656,22 +656,22 @@ libxlVmReap(libxlDriverPrivatePtr driver, * Handle previously registered event notification from libxenlight */ static void -libxlEventHandler(void *data, const libxl_event *event) +libxlEventHandler(void *data ATTRIBUTE_UNUSED, const libxl_event *event) { libxlDriverPrivatePtr driver = libxl_driver; - virDomainObjPtr vm = data; + virDomainObjPtr vm = NULL; virDomainEventPtr dom_event = NULL;
libxlDriverLock(driver);
On xen-unstable, I noticed an occasional deadlock here when libxl invokes the event handler with a SUSPEND shutdown reason. The driver lock is already held by the caller of libxl_domain_suspend(). Inspired by the xl implementation, I've updated this patch to ignore the SUSPEND shutdown reason before acquiring the driver lock. Regards, Jim

On Thu, Jan 24, 2013 at 12:45:21PM -0700, Jim Fehlig wrote:
Jim Fehlig wrote:
Since libxl provides the domain ID in the event handler callback, find the domain object based on the ID. This approach prevents processing the callback on a domain that has already been reaped. --- src/libxl/libxl_driver.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c index 7484b83..e28b641 100644 --- a/src/libxl/libxl_driver.c +++ b/src/libxl/libxl_driver.c @@ -656,22 +656,22 @@ libxlVmReap(libxlDriverPrivatePtr driver, * Handle previously registered event notification from libxenlight */ static void -libxlEventHandler(void *data, const libxl_event *event) +libxlEventHandler(void *data ATTRIBUTE_UNUSED, const libxl_event *event) { libxlDriverPrivatePtr driver = libxl_driver; - virDomainObjPtr vm = data; + virDomainObjPtr vm = NULL; virDomainEventPtr dom_event = NULL;
libxlDriverLock(driver);
On xen-unstable, I noticed an occasional deadlock here when libxl invokes the event handler with a SUSPEND shutdown reason. The driver lock is already held by the caller of libxl_domain_suspend(). Inspired by the xl implementation, I've updated this patch to ignore the SUSPEND shutdown reason before acquiring the driver lock.
Regards, Jim
From b9b3cd0259168ccf10f525d1b459bfe790162be3 Mon Sep 17 00:00:00 2001 From: Jim Fehlig <jfehlig@suse.com> Date: Mon, 21 Jan 2013 10:36:03 -0700 Subject: [PATCH 6/6] libxl: Domain event handler improvements
Since libxl provides the domain ID in the event handler callback, find the domain object based on the ID. This approach prevents processing the callback on a domain that has already been reaped.
Also, similar to the xl implementation, ignore the SUSPEND shutdown reason. By calling libxl_domain_suspend(), we know a shutdown event with SUSPEND reason will be generated, but it can be safely ignored since any subsequent cleanup will be done by the callers. --- src/libxl/libxl_driver.c | 26 ++++++++++++++++---------- 1 file changed, 16 insertions(+), 10 deletions(-)
diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c index 7484b83..39875aa 100644 --- a/src/libxl/libxl_driver.c +++ b/src/libxl/libxl_driver.c @@ -656,26 +656,32 @@ libxlVmReap(libxlDriverPrivatePtr driver, * Handle previously registered event notification from libxenlight */ static void -libxlEventHandler(void *data, const libxl_event *event) +libxlEventHandler(void *data ATTRIBUTE_UNUSED, const libxl_event *event) { libxlDriverPrivatePtr driver = libxl_driver; - virDomainObjPtr vm = data; + virDomainObjPtr vm = NULL; virDomainEventPtr dom_event = NULL; - - libxlDriverLock(driver); - virObjectLock(vm); - libxlDriverUnlock(driver); + libxl_shutdown_reason xl_reason = event->u.domain_shutdown.shutdown_reason;
if (event->type == LIBXL_EVENT_TYPE_DOMAIN_SHUTDOWN) { virDomainShutoffReason reason;
- if (event->domid != vm->def->id) + /* Similar to the xl implementation, ignore SUSPEND. Any actions needed + after calling libxl_domain_suspend() are handled by it's callers. */ + if (xl_reason == LIBXL_SHUTDOWN_REASON_SUSPEND) + goto cleanup; + + libxlDriverLock(driver); + vm = virDomainFindByID(&driver->domains, event->domid); + libxlDriverUnlock(driver); + + if (!vm) goto cleanup;
- switch (event->u.domain_shutdown.shutdown_reason) { + switch (xl_reason) { case LIBXL_SHUTDOWN_REASON_POWEROFF: case LIBXL_SHUTDOWN_REASON_CRASH: - if (event->u.domain_shutdown.shutdown_reason == LIBXL_SHUTDOWN_REASON_CRASH) { + if (xl_reason == LIBXL_SHUTDOWN_REASON_CRASH) { dom_event = virDomainEventNewFromObj(vm, VIR_DOMAIN_EVENT_STOPPED, VIR_DOMAIN_EVENT_STOPPED_CRASHED); @@ -694,7 +700,7 @@ libxlEventHandler(void *data, const libxl_event *event) libxlVmStart(driver, vm, 0, -1); break; default: - VIR_INFO("Unhandled shutdown_reason %d", event->u.domain_shutdown.shutdown_reason); + VIR_INFO("Unhandled shutdown_reason %d", xl_reason); break; } } -- 1.8.0.1
ACK, Daniel -- Daniel Veillard | Open Source and Standards, Red Hat veillard@redhat.com | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | virtualization library http://libvirt.org/

Jim Fehlig wrote:
This series fixes several bugs and races in the libxl driver, as described in the commit messages of the individual patches.
One of the races was discussed on the xen-devel ml [1] and resulted in two fixes in libxl [2,3]. I've tested this series with and without the libxl patches. All of the previously discovered races are fixed with this series and the libxl patches. It is possible to fire asserts in libxl with this series and without the libxl patches, but that is no worse than the current behavior.
The libxl patches have not yet been committed upstream, hence the 'XXXX' for commit id in patch 1. I will respond to the xen-devel thread now with an ACK on the libxl patches (and a reference to this series) so those can be committed.
http://lists.xen.org/archives/html/xen-devel/2013-01/msg01672.html
participants (2)
-
Daniel Veillard
-
Jim Fehlig