Re: [Libvirt-cim] [PATCH] [RFC] Implement libvirt event callback management

Tuesday, 18 August 2009

Hollis Blanchard wrote:
...
 On Tue, 2009-08-18 at 16:24 -0700, Kaitlin Rupert wrote:
> Hi Hollis,
>
> I applied your patch and the patches Richard submitted to do some 
> testing. I noticed a few issues commented below...
>
>
>> +/* Delete all watches marked for deletion. */
>> +static void event_watch_free_deleted(void)
>> +{
>> +        struct watch *cur;
>> +        struct watch **link;
>> +
>> +        CU_DEBUG("%s", __func__);
>> +
>> +        pthread_mutex_lock(&watch_list_mutex);
>> +
>> +        cur = watch_list; 
>> +        link = &watch_list;
>> +        while (cur != NULL) {
>> +                struct watch *next = cur->next;
>> +
>> +                if (cur->deleted) {
>> +                        *link = next;
>> +
>> +                        cur->ff(cur->opaque);
> I'm seeing a seg fault here because once we reach this point, cur is NULL.
>
>> +                        free(cur);
>> +                        watch_count--;
>> +                } else
>> +                        link = &cur->next;
>> +
>> +                cur = next;
>> +        }
>> +
>> +        pthread_mutex_unlock(&watch_list_mutex);
>> +}
>> +

 Am I missing something obvious? The loop only runs while cur is not
 NULL. 
Agreed.  I meant to say in my previous email that I am confused as well. 
  Actually, I put a check for (cur != NULL) right about the call to 
cur->ff - and cur is definitely NULL at that point.

I haven't spent enough time debugging it, here's what little I have so far:

event.c(142): event_watch_free_deleted
event.c(146): +++++++++watch_list addr is 0x7fffdc014b30

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff61c6710 (LWP 22188)]
0x0000000000000000 in ?? ()
Missing separate debuginfos, use: debuginfo-install 
tog-pegasus-2.9.0-3.fc12.x86_64
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x00000000069811f1 in event_watch_free_deleted () at event.c:156
#2  event_thread () at event.c:375
#3  0x0000000003c5e91a in start_thread () from /lib64/libpthread.so.0
#4  0x0000000004936b2d in clone () from /lib64/libc.so.6
#5  0x0000000000000000 in ?? ()
(gdb) up
#1  0x00000000069811f1 in event_watch_free_deleted () at event.c:156
156	                        cur->ff(cur->opaque);
(gdb) print cur
$1 = (struct watch *) 0x0
(gdb)

...

 Moreover, the list is locked, so concurrent list modifications wouldn't
 even explain it... unless the other user didn't take the lock (which we
 do on purpose sometimes). From the CU_DEBUG log, can you tell if there's
 any concurrency going on? 
I'll investigate this more tomorrow.  Richard - or maybe you can?

...

>> +/* One thread to watch all fds for all events for all libvirt threads. */
>> +static void *event_thread(void *ptr)
>> +{
>> +        while (1) {
>> +                struct watch *cur;
>> +                struct pollfd *pollfds;
>> +                struct pollfd *pollfd;
>> +                int timeout;
>> +                int i;
>> +
>> +                pollfds = malloc(sizeof(struct pollfd) * watch_count);
>> +
>> +                /* fill in pollfds array from our watch list */
>> +                for (pollfd = &pollfds[0], cur = watch_list;
>> +                     cur != NULL;
>> +                     pollfd++, cur = cur->next) {
>> +                        pollfd->fd = cur->fd;
>> +                        pollfd->events =
libvirt_to_poll_events(cur->events);
>> +                }
>> +
>> +                timeout = event_next_timeout();
>> +
>> +                poll(pollfds, watch_count, timeout);
>> +
>> +                /* invoke callbacks */
>> +                for (i = 0; i < watch_count; i++)
>> +                        for (cur = watch_list; cur != NULL; cur = cur->next)
>> +                                if (cur->fd == pollfds[i].fd
> When I generate a event, poll never seems to catch it.  I tried forcing 
> this by changing the timeout value to a minute.  And then generating the 
> event. I can see from the libvirt debug that the event has been generated:
>
> 15:00:34.232: debug : virEventRunOnce:567 : Poll got 1 event
> 15:00:34.239: debug : virEventDispatchHandles:450 : Dispatch n=2 f=8 w=3 
> e=1 0x7f2a57d6e6b0

 These messages are from qemud? So the event is generated on the other
 side of the "remote" pipe?

> In the eventAddHandle() call, we have:
>
> event.c(75): eventAddHandle
> event.c(82): ++++++++++++++watch->id is 0
> event.c(84): ++++++++++++++watch->fd is 11
> event.c(86): ++++++++++++++watch->events is 1
> event.c(88): ++++++++++++++watch->cb is 0x6b8f4c0, cb is 0x6b8f4c0
> event.c(90): ++++++++++++++watch->opaque is 0x7fffdc014900

 These seem reasonable, so we should have a list with one entry.

> I haven't tracked down what is happening here.
>
>
>> +                                    && !cur->deleted) {
>> +                                        invoke_callback(cur, &pollfds[i]);
>> +                                        break;
>> +                                }
>> +
>> +                free(pollfds);
>> +
>> +                event_watch_free_deleted();
>> +                event_timer_free_deleted();
>> +        }
>> +
>> +        return NULL;
>> +}

 Hmm, actually I'd expect the opposite: this code seems to *always*
 invoke the callback, even when no event is pending (oops!).

 Since that's not happening, it may be that the pollfds structure isn't
 being created properly, in particular the fd member. 
I should have said in my previous email - the value of pollfds[i].fd is 
always 0.  In looking at it more, I think the following is happening:

1) The CIMOM loads Richard's provider and init_events() is called.
2) This starts the event_thread.  At this point, we don't have a handle, 
so the following for loop fails:

                 for (pollfd = &pollfds[0], cur = watch_list;
                      cur != NULL;
                      pollfd++, cur = cur->next) {}
3) We call poll()
4) In the meantime, the provider's ActivateFilter() function has been 
called and the callback is registered
5) eventUpdateHandle() is called, but it doesn't update our pollfds list
5) We generate a domain event and libvirt dispatches it
5) Our fd in pollfds[i] is still 0, so we don't have a match.

event.c(388): init_events
[New Thread 0x7ffff61c6710 (LWP 22188)]
event.c(320): ^^^^^^^^^^in event_thread
event.c(330): ^^^^^^^^^^got pollfds
event.c(335): ^^^^^^^^^^cur is NULL!!
event.c(352): ^^^^^^^^^timeout is 30000
std_indication.c(231): Calling handler->activate_fn
Virt_GuestCrashAlertIndication.c(334): ActivateFilter for 
KVM_GuestCrashAlertIndication
misc_util.c(76): Connecting to libvirt with uri `qemu:///system'
event.c(75): eventAddHandle
event.c(82): ++++++++++++++watch->id is 0
event.c(84): ++++++++++++++watch->fd is 11
event.c(86): ++++++++++++++watch->events is 1
event.c(88): ++++++++++++++watch->cb is 0x6de44c0, cb is 0x6de44c0
event.c(90): ++++++++++++++watch->opaque is 0x7fffdc014900
event.c(174): eventAddTimeout
Virt_GuestCrashAlertIndication.c(338): Registering callback function
event.c(108): eventUpdateHandle 0
event.c(108): eventUpdateHandle 0
Virt_GuestCrashAlertIndication.c(343): ret val: 0

std_indication.c(270): IndicationVirt_GuestCrashAlertIndicationProvider: 
indications enabled
Virt_GuestCrashAlertIndication.c(365): EnableIndications
event.c(358): ^^^^^^^^^^loop through watch_count
event.c(360): ^^^^^^^^^^loop through watch_list
event.c(361): ==================cur->fd is 11, pollfds[i].fd is 0
event.c(362): ==================cur->deleted is 7

...

 Another possibility is that watch_count is somehow 0, which would also
 hose pollfds allocation...

 I guess strace output (for poll()), CU_DEBUG logs, and some gdb work
 would help.

-- 
Kaitlin Rupert
IBM Linux Technology Center
kaitlin(a)linux.vnet.ibm.com

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Re: [Libvirt-cim] [PATCH] [RFC] Implement libvirt event callback management