On Tue, 2009-08-18 at 16:24 -0700, Kaitlin Rupert wrote:
Hi Hollis,
I applied your patch and the patches Richard submitted to do some
testing. I noticed a few issues commented below...
> +/* Delete all watches marked for deletion. */
> +static void event_watch_free_deleted(void)
> +{
> + struct watch *cur;
> + struct watch **link;
> +
> + CU_DEBUG("%s", __func__);
> +
> + pthread_mutex_lock(&watch_list_mutex);
> +
> + cur = watch_list;
> + link = &watch_list;
> + while (cur != NULL) {
> + struct watch *next = cur->next;
> +
> + if (cur->deleted) {
> + *link = next;
> +
> + cur->ff(cur->opaque);
I'm seeing a seg fault here because once we reach this point, cur is NULL.
> + free(cur);
> + watch_count--;
> + } else
> + link = &cur->next;
> +
> + cur = next;
> + }
> +
> + pthread_mutex_unlock(&watch_list_mutex);
> +}
> +
Am I missing something obvious? The loop only runs while cur is not
NULL.
Moreover, the list is locked, so concurrent list modifications wouldn't
even explain it... unless the other user didn't take the lock (which we
do on purpose sometimes). From the CU_DEBUG log, can you tell if there's
any concurrency going on?
> +/* One thread to watch all fds for all events for all libvirt
threads. */
> +static void *event_thread(void *ptr)
> +{
> + while (1) {
> + struct watch *cur;
> + struct pollfd *pollfds;
> + struct pollfd *pollfd;
> + int timeout;
> + int i;
> +
> + pollfds = malloc(sizeof(struct pollfd) * watch_count);
> +
> + /* fill in pollfds array from our watch list */
> + for (pollfd = &pollfds[0], cur = watch_list;
> + cur != NULL;
> + pollfd++, cur = cur->next) {
> + pollfd->fd = cur->fd;
> + pollfd->events =
libvirt_to_poll_events(cur->events);
> + }
> +
> + timeout = event_next_timeout();
> +
> + poll(pollfds, watch_count, timeout);
> +
> + /* invoke callbacks */
> + for (i = 0; i < watch_count; i++)
> + for (cur = watch_list; cur != NULL; cur = cur->next)
> + if (cur->fd == pollfds[i].fd
When I generate a event, poll never seems to catch it. I tried forcing
this by changing the timeout value to a minute. And then generating the
event. I can see from the libvirt debug that the event has been generated:
15:00:34.232: debug : virEventRunOnce:567 : Poll got 1 event
15:00:34.239: debug : virEventDispatchHandles:450 : Dispatch n=2 f=8 w=3
e=1 0x7f2a57d6e6b0
These messages are from qemud? So the event is generated on the other
side of the "remote" pipe?
In the eventAddHandle() call, we have:
event.c(75): eventAddHandle
event.c(82): ++++++++++++++watch->id is 0
event.c(84): ++++++++++++++watch->fd is 11
event.c(86): ++++++++++++++watch->events is 1
event.c(88): ++++++++++++++watch->cb is 0x6b8f4c0, cb is 0x6b8f4c0
event.c(90): ++++++++++++++watch->opaque is 0x7fffdc014900
These seem reasonable, so we should have a list with one entry.
I haven't tracked down what is happening here.
> + && !cur->deleted) {
> + invoke_callback(cur, &pollfds[i]);
> + break;
> + }
> +
> + free(pollfds);
> +
> + event_watch_free_deleted();
> + event_timer_free_deleted();
> + }
> +
> + return NULL;
> +}
Hmm, actually I'd expect the opposite: this code seems to *always*
invoke the callback, even when no event is pending (oops!).
Since that's not happening, it may be that the pollfds structure isn't
being created properly, in particular the fd member.
Another possibility is that watch_count is somehow 0, which would also
hose pollfds allocation...
I guess strace output (for poll()), CU_DEBUG logs, and some gdb work
would help.
--
Hollis Blanchard
IBM Linux Technology Center