NB, even with this done there is still a latent bug affecting all
platforms. When we call g_source_destroy the removal is async but
we usually close the FD synchronously. This leads to poll'ing on
a bad FD.
We've actually had this race in libvirt since day 1 - our previous
poll() event loop impl before glib would also implement the
virEventRemoveHandle call async by just writing to a pipe to
interrupt the other thread in poll, just as glib does.
We've always relied on parallelism to make this async call almost
instantaneous but under the right load conditions we trigger the
POLLNVAL / EBADF issue.
The only viable solution to this that I see is to only ever
call g_source_destroy + g_source_unref from an idle callback,
to guarantee that poll() isn't currently running.
We know this has a bit of a perf hit on code that is sensitive
to main loop iterations, so we tried to avoid it where possible
right now:
https://listman.redhat.com/archives/libvir-list/2020-November/212411.html
I think we'll need to revisit this though, as known BADF problems
are not good.
Daniel P. Berrangé (2):
ci: print stack traces on macOS if any tests fail
tests: don't set G_DEBUG=fatal-warnings on macOS
ci/cirrus/build.yml | 2 +-
tests/meson.build | 17 ++++++++++++++++-
2 files changed, 17 insertions(+), 2 deletions(-)
--
2.35.1