[libvirt] [BUG] EPOLL_CLOEXEC undeclared

Hi, $ cat /etc/debian_version 5.0.1 $ dpkg -S eventpoll.h linux-libc-dev: /usr/include/linux/eventpoll.h $ dpkg-query -W linux-libc-dev linux-libc-dev 2.6.32-35~ucs1.48.201109051614 $ git describe v0.9.9-57-g7eb9cfd $ ./autogen.sh ; make ... (cd .libs && rm -f libvirt_test.la && ln -s ../libvirt_test.la libvirt_test.la) gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I.. -I../gnulib/lib -I../gnulib/lib -I../include -I../src/util -I../include -DIN_LIBVIRT -I../src/conf -I/usr/include/libxml2 -Wall -W -Wformat-y2k -Wformat-security -Winit-self -Wmissing-include-dirs -Wunused -Wunknown-pragmas -Wstrict-aliasing -Wshadow -Wpointer-arith -Wbad-function-cast -Wcast-align -Wwrite-strings -Wlogical-op -Waggregate-return -Wstrict-prototypes -Wold-style-definition -Wmissing-prototypes -Wmissing-declarations -Wmissing-noreturn -Wmissing-format-attribute -Wredundant-decls -Wnested-externs -Winline -Winvalid-pch -Wvolatile-register-var -Wdisabled-optimization -Wattributes -Wcoverage-mismatch -Wmultichar -Wdeprecated-declarations -Wdiv-by-zero -Wendif-labels -Wextra -Wformat-contains-nul -Wformat-extra-args -Wformat-zero-length -Wformat=2 -Wmultichar -Wnormalized=nfc -Woverflow -Wpointer-to-int-cast -Wpragmas -Wno-missing-field-initializers -Wno-sign-compare -Wno-format-nonliteral -Wp,-D_FORTIFY_SOURCE=2 -fstack-protector-all --param=ssp-buffer-size=4 -fexceptions -fasynchronous-unwind-tables -fdiagnostics-show-option -funit-at-a-time -fipa-pure-const -g -O2 -MT libvirt_lxc-lxc_controller.o -MD -MP -MF .deps/libvirt_lxc-lxc_controller.Tpo -c -o libvirt_lxc-lxc_controller.o `test -f 'lxc/lxc_controller.c' || echo './'`lxc/lxc_controller.c lxc/lxc_controller.c: In function 'lxcControllerMain': lxc/lxc_controller.c:1176: warning: implicit declaration of function 'epoll_create1' lxc/lxc_controller.c:1176: warning: nested extern declaration of 'epoll_create1' [-Wnested-externs] lxc/lxc_controller.c:1176: error: 'EPOLL_CLOEXEC' undeclared (first use in this function) lxc/lxc_controller.c:1176: error: (Each undeclared identifier is reported only once lxc/lxc_controller.c:1176: error: for each function it appears in.) make[3]: *** [libvirt_lxc-lxc_controller.o] Error 1 make[3]: Leaving directory `/root/libvirt/src' ... $ git show v0.9.9-32-g9130396 | head -5 commit 9130396214975ba2251082f943c9717281039050 Author: Daniel P. Berrange <berrange@redhat.com> Date: Thu Jan 12 17:03:03 2012 +0000 Re-write LXC controller end-of-file I/O handling yet again Sincerely Philipp -- Philipp Hahn Open Source Software Engineer hahn@univention.de Univention GmbH Linux for Your Business fon: +49 421 22 232- 0 Mary-Somerville-Str.1 D-28359 Bremen fax: +49 421 22 232-99 http://www.univention.de/

On 01/16/2012 08:07 AM, Philipp Hahn wrote:
Hi,
$ cat /etc/debian_version 5.0.1 $ dpkg -S eventpoll.h linux-libc-dev: /usr/include/linux/eventpoll.h $ dpkg-query -W linux-libc-dev linux-libc-dev 2.6.32-35~ucs1.48.201109051614
glibc has supported epoll_create1() and EPOLL_CLOEXEC since glibc 2.9. On my Fedora systems, I see that even eventpoll.h from Fedora 12 kernel-headers-2.6.32.26-175.fc12.i686 had #define EPOLL_CLOEXEC O_CLOEXEC But I _also_ see that /usr/include/sys/epoll.h has EPOLL_CLOEXEC (as well as EPOLL_NONBLOCK) defined; in my case on glibc-headers-2.11.2-3.i686. Our file includes <sys/epoll.h>, not <linux/eventpoll.h>; so now the trick is to figure out why your <sys/epoll.h> seems to be so old as to lack EPOLL_CLOEXEC. Meanwhile, given that Linux has tied EPOLL_CLOEXEC to O_CLOEXEC, and further that EPOLL_CLOEXEC doesn't exist on any other systems, I think the trivial patch would be to use O_CLOEXEC instead, along with a comment why we don't use the documented interface (in order to cater to older systems). Would you like to submit the patch for that? -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

Hello Eric, thank you for taking a look at my problem. On Monday 16 January 2012 16:36:00 Eric Blake wrote:
On 01/16/2012 08:07 AM, Philipp Hahn wrote:
$ dpkg-query -W linux-libc-dev linux-libc-dev 2.6.32-35~ucs1.48.201109051614
glibc has supported epoll_create1() and EPOLL_CLOEXEC since glibc 2.9.
That's the problem on this (old) Debian Lenny system: # dpkg-query -W libc6-dev libc6-dev 2.7-18.32.201101241735
#define EPOLL_CLOEXEC O_CLOEXEC
Adding that define doesn't solve the problem; the next errors are lxc/lxc_controller.c: In function ‘lxcControllerMain’: lxc/lxc_controller.c:1176: warning: implicit declaration of function ‘epoll_create1’ lxc/lxc_controller.c:1176: warning: nested extern declaration of ‘epoll_create1’ [-Wnested-externs] # objdump -T /lib/libc-2.7.so | grep epoll_create 00000000000cfb60 g DF .text 0000000000000025 GLIBC_2.3.2 epoll_create From a different (newer) Debian Squeeze system: # objdump -T /lib/libc-2.11.2.so | grep epoll_create 000cc020 g DF .text 00000034 GLIBC_2.3.2 epoll_create 000cc060 g DF .text 00000034 GLIBC_2.9 epoll_create1
Would you like to submit the patch for that?
For me this looks like lxc now only works with glibc >= 2.9, so an appropriate check in configure should be added? Or a fall-back to epoll_create()? diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c index 49727dd..bb36d91 100644 --- a/src/lxc/lxc_controller.c +++ b/src/lxc/lxc_controller.c @@ -1173,7 +1173,11 @@ static int lxcControllerMain(int serverFd, consoles[i].hostFd = hostFds[i]; consoles[i].contFd = contFds[i]; +#ifdef EPOLL_CLOEXEC if ((consoles[i].epollFd = epoll_create1(EPOLL_CLOEXEC)) < 0) { +#else + if ((consoles[i].epollFd = epoll_create(0)) < 0 || virSetInherit(consoles[i].epollFd, false) < 0) { +#endif virReportSystemError(errno, "%s", _("Unable to create epoll fd")); goto cleanup; Yes, I know it's ugly and not 100% thread/signal/async save, but at least if compiles again with older libc versions. Sincerely Philipp -- Philipp Hahn Open Source Software Engineer hahn@univention.de Univention GmbH Linux for Your Business fon: +49 421 22 232- 0 Mary-Somerville-Str.1 D-28359 Bremen fax: +49 421 22 232-99 http://www.univention.de/

On 01/17/2012 04:42 AM, Philipp Hahn wrote:
glibc has supported epoll_create1() and EPOLL_CLOEXEC since glibc 2.9.
That's the problem on this (old) Debian Lenny system: # dpkg-query -W libc6-dev libc6-dev 2.7-18.32.201101241735
And what kernel is that system running? We already refuse to compile lxc for RHEL 5 (kernel 2.6.18), as that particular kernel is too old to usefully support namespace and other operations required by lxc. It's one thing if the kernel call exists, but glibc is too old to expose it, and another thing altogether if the kernel is too old to support it.
For me this looks like lxc now only works with glibc >= 2.9, so an appropriate check in configure should be added?
Yes, this sounds best to me. Daniel, what is the minimum version of kernel that you are willing to support for LXC, and given that, what is the best thing to probe for in configure.ac to determine whether it even makes sense to compile LXC? We already reject kernels that lack unshare() (dates back to 2.6.16) and LO_FLAGS_AUTOCLEAR (dates back to Oct 2007, but I'm not sure which kernel introduced it).
Or a fall-back to epoll_create()?
I'd prefer to reject old kernels, rather than add a fallback, if we suspect anything else in our LXC implementation to not work with a kernel that old. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

Hello Eric, Eric Blake <eblake@redhat.com> wrote:
On 01/17/2012 04:42 AM, Philipp Hahn wrote: And what kernel is that system running?
2.6.32
We already refuse to compile lxc for RHEL 5 (kernel 2.6.18), as that particular kernel is too old to usefully support namespace and other operations required by lxc.
I haven't used lxc on that particular system, but only tried to comile that newer version. configure did detect lxc itself and then failed building it, so that was the only thing to get me started looking at that issue. Since I don't need lxc on that system, I'm fine with just disabling lxc. Sincerely Philipp

On 01/18/2012 01:01 AM, Philipp Hahn wrote:
Hello Eric,
Eric Blake <eblake@redhat.com> wrote:
On 01/17/2012 04:42 AM, Philipp Hahn wrote: And what kernel is that system running?
2.6.32
That's an unusual mix, where the syscall exists (since 2.6.27) but libc is too old to use the syscall.
We already refuse to compile lxc for RHEL 5 (kernel 2.6.18), as that particular kernel is too old to usefully support namespace and other operations required by lxc.
I haven't used lxc on that particular system, but only tried to comile that newer version. configure did detect lxc itself and then failed building it, so that was the only thing to get me started looking at that issue. Since I don't need lxc on that system, I'm fine with just disabling lxc.
Then how about this patch: From 330f666036943a0fc423a4b5db2ca294fb2a4298 Mon Sep 17 00:00:00 2001 From: Eric Blake <eblake@redhat.com> Date: Thu, 19 Jan 2012 13:35:39 -0700 Subject: [PATCH] build: skip lxc with too-old glibc Since we already require the kernel to be new enough to support LO_FLAGS_AUTOCLEAR, we might as well also require glibc to be new enough to support epoll_create1(). * configure.ac (with_lxc): We require glibc 2.9 for LXC. Reported by Philipp Hahn. --- configure.ac | 7 ++++--- 1 files changed, 4 insertions(+), 3 deletions(-) diff --git a/configure.ac b/configure.ac index 729ba9b..02d60d0 100644 --- a/configure.ac +++ b/configure.ac @@ -734,16 +734,17 @@ if test "$with_lxc" = "yes" || test "$with_lxc" = "check"; then AC_TRY_LINK([ #include <sched.h> #include <linux/loop.h> + #include <sys/epoll.h> ], [ - unshare (!LO_FLAGS_AUTOCLEAR); + unshare (!(LO_FLAGS_AUTOCLEAR + EPOLL_CLOEXEC)); ], [ with_lxc=yes ], [ if test "$with_lxc" = "check"; then with_lxc=no - AC_MSG_NOTICE([Function unshare() not present in <sched.h> header but required for LXC driver, disabling it]) + AC_MSG_NOTICE([Required kernel features were not found, disabling LXC]) else - AC_MSG_ERROR([Function unshare() not present in <sched.h> header, but required for LXC driver]) + AC_MSG_ERROR([Required kernel features for LXC were not found]) fi ]) fi -- 1.7.7.5 -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

Hello Eric, On Thursday 19 January 2012 21:38:22 Eric Blake wrote:
That's an unusual mix, where the syscall exists (since 2.6.27) but libc is too old to use the syscall.
The original kernel was 2.6.26, but which later was updated to 2.6.32 for better support of newer hardware.
Then how about this patch:
From 330f666036943a0fc423a4b5db2ca294fb2a4298 Mon Sep 17 00:00:00 2001 From: Eric Blake <eblake@redhat.com> Date: Thu, 19 Jan 2012 13:35:39 -0700 Subject: [PATCH] build: skip lxc with too-old glibc ...
Looks good: Now the missing support is detected and LXC is automatically disabled or refuses to compile. I did NOT test if it still compiles on newer systems.
Reported by Philipp Hahn. You may add a "Testes-by: me" if needed.
Thank you for your work and the patch. BYtE Philipp -- Philipp Hahn Open Source Software Engineer hahn@univention.de Univention GmbH Linux for Your Business fon: +49 421 22 232- 0 Mary-Somerville-Str.1 D-28359 Bremen fax: +49 421 22 232-99 http://www.univention.de/

On 01/23/2012 02:48 AM, Philipp Hahn wrote:
Then how about this patch:
From 330f666036943a0fc423a4b5db2ca294fb2a4298 Mon Sep 17 00:00:00 2001 From: Eric Blake <eblake@redhat.com> Date: Thu, 19 Jan 2012 13:35:39 -0700 Subject: [PATCH] build: skip lxc with too-old glibc ...
Looks good: Now the missing support is detected and LXC is automatically disabled or refuses to compile. I did NOT test if it still compiles on newer systems.
But I did test on newer systems :)
Reported by Philipp Hahn. You may add a "Testes-by: me" if needed.
Thank you for your work and the patch.
Thanks; patch now pushed. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
participants (2)
-
Eric Blake
-
Philipp Hahn