On Tue, Aug 25, 2020 at 4:07 PM Daniel P. Berrangé <berrange(a)redhat.com> wrote:
On Tue, Aug 25, 2020 at 03:16:50PM +0200, Christian Ehrhardt wrote:
> Hi,
> I expect that this falls under the "with meson now everything is
> different anyway" umbrella but wanted to let you know about this as it
> affects v6.6 in at least Ubuntu/Debian.
>
> The following recent patch has broken libvirt-lxc for us:
> commit d7147b3797380de2d159ce6324536f3e1f2d97e3
> Author: Pavel Hrdina <phrdina(a)redhat.com>
> Date: Fri Jun 19 00:44:07 2020 +0200
> m4: virt-xdr: rewrite XDR check
>
> I was tracking that down for [1] since the tests [4] failed on me. [2]
> holds the backtrace.
> In Debian the tests are skipped which explains why they were not seen there:
> smoke-lxc SKIP Test requires machine-level isolation but testbed
> does not provide that
>
> What happens is that the libvirt_lxc segfaults when using XDR functions.
>
> dmesg shows:
> [582093.524644] libvirt_lxc[261446]: segfault at 0 ip 0000000000000000
> sp 00007ffdd2345598 error 14 in libvirt_lxc[5587e42aa000+8000]
> [582093.524650] Code: Bad RIP value.
>
> There are quite some uncertainties left, but on the surface it seems
> that it links with libtirpc but
> then instead of calling
> libtirpc: src/xdr.c:929:xdr_uint64_t(xdrs, ullp)
> it ends (gdb tells us in [2]) in glibc
> glibc: sunrpc/xdr_intXX_t.c:62:xdr_uint64_t (XDR *xdrs, uint64_t *uip)
>
> And the return from that function breaks it badly (instruction pointer
> at 0x0 -> segfault)
Right so that's a serious problem with clashing symbols between tirpc
and glibc.
In Fedora/RHEL it is impossible to build against glibc for the XDR
symbols for a long time now. Glibc maintainers want everyone to be
using tirpc. The symbols are still exported from glibc, but they
should only be used by legacy apps built against older glibc.
Symbol versioning should ensure libvirt_lxc always resolves to the
libtirpc library
$ eu-readelf -a /usr/lib64/libc.so.6 | grep xdr_uint64 | grep GLOBAL
2017: 00000000001349c0 226 FUNC GLOBAL DEFAULT 15 xdr_uint64_t(a)GLIBC_2.2.5
$ eu-readelf -a /usr/lib64/libtirpc.so | grep xdr_uint64 | grep GLOBAL
344: 000000000001ce20 9 FUNC GLOBAL DEFAULT 14 xdr_uint64_t@(a)TIRPC_0.3.0
$ eu-readelf -a /usr/libexec/libvirt_lxc | grep xdr_uint64
0x0000000000024a30 X86_64_JUMP_SLOT 000000000000000000 +0 xdr_uint64_t
149: 0000000000000000 0 FUNC GLOBAL DEFAULT UNDEF xdr_uint64_t(a)TIRPC_0.3.0
(13)
ubuntu@groovy:~$ eu-readelf -a /lib/x86_64-linux-gnu/libc.so.6 | grep
xdr_uint64 | grep GLOBAL
2019: 0000000000159ed0 228 FUNC GLOBAL DEFAULT 16
xdr_uint64_t@(a)GLIBC_2.2.5
ubuntu@groovy:~$ eu-readelf -a /lib/x86_64-linux-gnu/libtirpc.so.3.0.0
| grep xdr_uint64 | grep GLOBAL
343: 000000000001ae20 9 FUNC GLOBAL DEFAULT 15
xdr_uint64_t@(a)TIRPC_0.3.0
Ubuntu v6.0 builds
ubuntu@groovy:~$ eu-readelf -a /usr/lib/libvirt/libvirt_lxc | grep xdr_uint64
0x0000000000026820 X86_64_JUMP_SLOT 000000000000000000 +0 xdr_uint64_t
99: 0000000000000000 0 FUNC GLOBAL DEFAULT UNDEF
xdr_uint64_t(a)GLIBC_2.2.5 (4)
[ 1c02] xdr_uint64_t
Ubuntu v6.6 builds
ubuntu@groovy:~$ eu-readelf -a /usr/lib/libvirt/libvirt_lxc | grep xdr_uint64
0x00000000000268d0 X86_64_JUMP_SLOT 000000000000000000 +0 xdr_uint64_t
104: 0000000000000000 0 FUNC GLOBAL DEFAULT UNDEF
xdr_uint64_t(a)GLIBC_2.2.5 (4)
[ 1a81] xdr_uint64_t
They miss the version 3.0 entry - interesting.
libvirt 6.6 build from git on the same system:
$ eu-readelf -a libvirt/build/src/.libs/libvirt_lxc | grep xdr_uint64
0x0000000000028968 X86_64_JUMP_SLOT 000000000000000000 +0 xdr_uint64_t
99: 0000000000000000 0 FUNC GLOBAL DEFAULT UNDEF
xdr_uint64_t(a)GLIBC_2.2.5 (3)
598: 0000000000000000 0 FUNC GLOBAL DEFAULT UNDEF
xdr_uint64_t@(a)GLIBC_2.2.5
[ 31df] xdr_uint64_t@(a)GLIBC_2.2.5
[ 18f4] xdr_uint64_t
That is with
configure: xdr: yes (CFLAGS='-I/usr/include/tirpc'
LIBS='-ltirpc')
So something is wrong at build time when glibc AND tirpc provide that symbol.
This shows libvirt_lxc will only resolve to libtirpc.
I see the Ubuntu package for glibc is passing --enable-obsolete-rpc which
allows apps to continue to build against glibc for RPC :-(
So I suspect somehow libvirt has ended up using tirpc headers, but the linker
probably resolved symbols to glibc.
As I wrote above my builds don't get the 3.0 entry in libvirt_lxc
which seems to be the reason to then jump to the wrong one.
I don't know how the linker decides which library to resolve
symbols to
when multiple provided the same symbol with different versions. Possibly
tries in order ? I do recall that there were lots of problems with having
both glibc and libtirpc used in Fedora before glibc introduced the
abilty to disable RPC via --disable-obsolete-rpc to
Did I mention that --enable-obsolete-rpc is a bad idea yet :-P
You are probably right, but that will be a different bug for a different day.
FWIW, you're going to be forced to stop using this arg because it
has been
deleted entirely in glibc 2.32, so there's no way to compile against
glibc for XDR. Only existing built binaries will work.
By then at least it won't be able to link in the wrong one anymore :-)
And 2.32 is planned sometime soon for Ubuntu [1], so maybe I can do
the revert for a week and then drop it on a rebuild.
[1]:
https://discourse.ubuntu.com/t/groovy-gorilla-release-schedule/15531
--
Christian Ehrhardt
Staff Engineer, Ubuntu Server
Canonical Ltd