[libvirt] 'stack smashing detected' in 1.2.18 (caused by virNetDevGFeatureAvailable)

I recently compiled 1.2.18 to start testing with it, and was getting this error on startup: *** stack smashing detected ***: libvirtd terminated ======= Backtrace: ========= /lib64/libc.so.6(__fortify_fail+0x37)[0x7fe1ac631527] /lib64/libc.so.6(__fortify_fail+0x0)[0x7fe1ac6314f0] //lib/libvirt.so.0(+0xa7927)[0x7fe1aeda2927] //lib/libvirt/connection-driver/libvirt_driver_nodedev.so(+0x947d)[0x7fe1958a047d] //lib/libvirt/connection-driver/libvirt_driver_nodedev.so(+0xa6c2)[0x7fe1958a16c2] //lib/libvirt/connection-driver/libvirt_driver_nodedev.so(+0xaf4e)[0x7fe1958a1f4e] //lib/libvirt.so.0(virStateInitialize+0xb8)[0x7fe1aee6d0a8] libvirtd(+0x15120)[0x7fe1afae6120] //lib/libvirt.so.0(+0xd4975)[0x7fe1aedcf975] /lib64/libpthread.so.0(+0x30316079d1)[0x7fe1ada8c9d1] /lib64/libc.so.6(clone+0x6d)[0x7fe1ac6178fd] (gdb) bt #0 0x00007ffff4a8f625 in raise () from /lib64/libc.so.6 #1 0x00007ffff4a90e05 in abort () from /lib64/libc.so.6 #2 0x00007ffff4acd537 in __libc_message () from /lib64/libc.so.6 #3 0x00007ffff4b5f527 in __fortify_fail () from /lib64/libc.so.6 #4 0x00007ffff4b5f4f0 in __stack_chk_fail () from /lib64/libc.so.6 #5 0x00007ffff72d0927 in virNetDevGetFeatures (ifname=<value optimized out>, out=<value optimized out>) at util/virnetdev.c:3200 #6 0x00007fffdddce47d in udevProcessNetworkInterface (device=0x7fffd4071f70, def=0x6) at node_device/node_device_udev.c:694 #7 udevGetDeviceDetails (device=0x7fffd4071f70, def=0x6) at node_device/node_device_udev.c:1272 #8 0x00007fffdddcf6c2 in udevAddOneDevice (device=0x7fffd4071f70) at node_device/node_device_udev.c:1394 #9 0x00007fffdddcff4e in udevProcessDeviceListEntry (privileged=<value optimized out>, callback=<value optimized out>, opaque=<value optimized out>) at node_device/node_device_udev.c:1433 #10 udevEnumerateDevices (privileged=<value optimized out>, callback=<value optimized out>, opaque=<value optimized out>) at node_device/node_device_udev.c:1463 #11 nodeStateInitialize (privileged=<value optimized out>, callback=<value optimized out>, opaque=<value optimized out>) at node_device/node_device_udev.c:1773 #12 0x00007ffff739b0a8 in virStateInitialize (privileged=true, callback=0x555555569070 <daemonInhibitCallback>, opaque=0x5555557f1db0) at libvirt.c:777 #13 0x0000555555569120 in daemonRunStateInit (opaque=<value optimized out>) at libvirtd.c:947 #14 0x00007ffff72fd975 in virThreadHelper (data=<value optimized out>) at util/virthread.c:206 #15 0x00007ffff5fba9d1 in start_thread () from /lib64/libpthread.so.0 #16 0x00007ffff4b458fd in clone () from /lib64/libc.so.6 In IRC, we tracked this down to this bit of code: g_cmd.cmd = ETHTOOL_GFEATURES; g_cmd.size = GFEATURES_SIZE; if (virNetDevGFeatureAvailable(ifname, &g_cmd)) ignore_value(virBitmapSetBit(*out, VIR_NET_DEV_FEAT_TXUDPTNL)); GFEATURES_SIZE is currently defined as 2, but this value needs to be higher in order to support newer kernels. It looks like this code was added in ac3ed2085fcbeecaf5aa347c0b1bffaf94fff293 ethtool calculates this value based on the number of supported features: http://lxr.free-electrons.com/source/net/core/ethtool.c#L55 I don't know enough about this to properly fix this, but raising GFEATURES_SIZE to 3 has fixed this issue for me (though, this will obviously need to go higher as more features get added) This crash was occurring on a CentOS 6 system, running a the ELRepo kernel-ml kernel. The stock CentOS 6 kernel (2.6.32) does not appear to have sufficient features available to trigger this.

On 08/05/2015 12:09 PM, Brian Rak wrote:
I recently compiled 1.2.18 to start testing with it, and was getting this error on startup:
*** stack smashing detected ***: libvirtd terminated ======= Backtrace: ========= /lib64/libc.so.6(__fortify_fail+0x37)[0x7fe1ac631527] /lib64/libc.so.6(__fortify_fail+0x0)[0x7fe1ac6314f0] //lib/libvirt.so.0(+0xa7927)[0x7fe1aeda2927] //lib/libvirt/connection-driver/libvirt_driver_nodedev.so(+0x947d)[0x7fe1958a047d]
//lib/libvirt/connection-driver/libvirt_driver_nodedev.so(+0xa6c2)[0x7fe1958a16c2]
//lib/libvirt/connection-driver/libvirt_driver_nodedev.so(+0xaf4e)[0x7fe1958a1f4e]
//lib/libvirt.so.0(virStateInitialize+0xb8)[0x7fe1aee6d0a8] libvirtd(+0x15120)[0x7fe1afae6120] //lib/libvirt.so.0(+0xd4975)[0x7fe1aedcf975] /lib64/libpthread.so.0(+0x30316079d1)[0x7fe1ada8c9d1] /lib64/libc.so.6(clone+0x6d)[0x7fe1ac6178fd]
(gdb) bt #0 0x00007ffff4a8f625 in raise () from /lib64/libc.so.6 #1 0x00007ffff4a90e05 in abort () from /lib64/libc.so.6 #2 0x00007ffff4acd537 in __libc_message () from /lib64/libc.so.6 #3 0x00007ffff4b5f527 in __fortify_fail () from /lib64/libc.so.6 #4 0x00007ffff4b5f4f0 in __stack_chk_fail () from /lib64/libc.so.6 #5 0x00007ffff72d0927 in virNetDevGetFeatures (ifname=<value optimized out>, out=<value optimized out>) at util/virnetdev.c:3200 #6 0x00007fffdddce47d in udevProcessNetworkInterface (device=0x7fffd4071f70, def=0x6) at node_device/node_device_udev.c:694 #7 udevGetDeviceDetails (device=0x7fffd4071f70, def=0x6) at node_device/node_device_udev.c:1272 #8 0x00007fffdddcf6c2 in udevAddOneDevice (device=0x7fffd4071f70) at node_device/node_device_udev.c:1394 #9 0x00007fffdddcff4e in udevProcessDeviceListEntry (privileged=<value optimized out>, callback=<value optimized out>, opaque=<value optimized out>) at node_device/node_device_udev.c:1433 #10 udevEnumerateDevices (privileged=<value optimized out>, callback=<value optimized out>, opaque=<value optimized out>) at node_device/node_device_udev.c:1463 #11 nodeStateInitialize (privileged=<value optimized out>, callback=<value optimized out>, opaque=<value optimized out>) at node_device/node_device_udev.c:1773 #12 0x00007ffff739b0a8 in virStateInitialize (privileged=true, callback=0x555555569070 <daemonInhibitCallback>, opaque=0x5555557f1db0) at libvirt.c:777 #13 0x0000555555569120 in daemonRunStateInit (opaque=<value optimized out>) at libvirtd.c:947 #14 0x00007ffff72fd975 in virThreadHelper (data=<value optimized out>) at util/virthread.c:206 #15 0x00007ffff5fba9d1 in start_thread () from /lib64/libpthread.so.0 #16 0x00007ffff4b458fd in clone () from /lib64/libc.so.6
In IRC, we tracked this down to this bit of code:
g_cmd.cmd = ETHTOOL_GFEATURES; g_cmd.size = GFEATURES_SIZE; if (virNetDevGFeatureAvailable(ifname, &g_cmd)) ignore_value(virBitmapSetBit(*out, VIR_NET_DEV_FEAT_TXUDPTNL));
GFEATURES_SIZE is currently defined as 2, but this value needs to be higher in order to support newer kernels. It looks like this code was added in ac3ed2085fcbeecaf5aa347c0b1bffaf94fff293
ethtool calculates this value based on the number of supported features: http://lxr.free-electrons.com/source/net/core/ethtool.c#L55
I don't know enough about this to properly fix this, but raising GFEATURES_SIZE to 3 has fixed this issue for me (though, this will obviously need to go higher as more features get added)
(in later IRC conversation, Brian noted that raising GFEATURES_SIZE *didn't* always eliminate the issue...) The problem goes beyond that: 1) as far as I can see, g_cmd.size needs to be set to the number of items in the array g_cmd.feature, and we're setting it to 2 (GFEATURES_SIZE), but we have allocated space for exactly *0* items in that array. If we're going to tell the kernel we have 2 items in the array, we need to actually have that space available, or the kernel will overwrite something else. 2) the feature we're looking for is called "TX_UDP_TNL" in libvirt, and is manually #defined to be bit 25. From the title of the commit log for the patch that added this code to libvirt, you can see what we want to check for is the feature called "tx-udp_tnl-segmentation", and if you look at the ethtool.c source from the kernel that Brian has linked to above, you'll see that the netdev_features_strings[NETIF_F_GSO_UDP_TUNNEL_BIT] is initialized to "tx-upd_tnl-segmentation". When you look up NETIF_F_GSO_UDP_TUNNEL_BIT, it seems to be *26* in the enum where it is defined: http://osxr.org/linux/source/include/linux/netdev_features.h#0047 So are we checking for the wrong feature? It would be "really nice" if we could avoid #defining magic values like TX_UDP_TNL and GFEATURES_SIZE in our source. In my quick investigation I couldn't see a way around that though (since NETIF_GSO_UDP_TUNNEL_BIT) isn't available outside kernel source).
This crash was occurring on a CentOS 6 system, running a the ELRepo kernel-ml kernel. The stock CentOS 6 kernel (2.6.32) does not appear to have sufficient features available to trigger this.
I guess it would depend on whether or not ETHTOOL_GFEATURES is defined for the 2.6 kernels and, if so, then what was being overwritten beyond the end of g_cmd. (there are other locals defined both before and after g_cmd; all are used only *before* g_cmd is used. I'm not sure what order locals are in memory, so I don't know which are being overwritten.)

Hi Brian, I added a new function to calculate the gfeature size that is supported by the kernel. I tested on my setup and it return 2 for me, can you check that it returns 3 in your case. Thanks, Moshe Levi.
-----Original Message----- From: libvir-list-bounces@redhat.com [mailto:libvir-list- bounces@redhat.com] On Behalf Of Brian Rak Sent: Wednesday, August 05, 2015 7:09 PM To: libvir-list@redhat.com Subject: [libvirt] 'stack smashing detected' in 1.2.18 (caused by virNetDevGFeatureAvailable)
I recently compiled 1.2.18 to start testing with it, and was getting this error on startup:
*** stack smashing detected ***: libvirtd terminated ======= Backtrace: ========= /lib64/libc.so.6(__fortify_fail+0x37)[0x7fe1ac631527] /lib64/libc.so.6(__fortify_fail+0x0)[0x7fe1ac6314f0] //lib/libvirt.so.0(+0xa7927)[0x7fe1aeda2927] //lib/libvirt/connection- driver/libvirt_driver_nodedev.so(+0x947d)[0x7fe1958a047d] //lib/libvirt/connection- driver/libvirt_driver_nodedev.so(+0xa6c2)[0x7fe1958a16c2] //lib/libvirt/connection- driver/libvirt_driver_nodedev.so(+0xaf4e)[0x7fe1958a1f4e] //lib/libvirt.so.0(virStateInitialize+0xb8)[0x7fe1aee6d0a8] libvirtd(+0x15120)[0x7fe1afae6120] //lib/libvirt.so.0(+0xd4975)[0x7fe1aedcf975] /lib64/libpthread.so.0(+0x30316079d1)[0x7fe1ada8c9d1] /lib64/libc.so.6(clone+0x6d)[0x7fe1ac6178fd]
(gdb) bt #0 0x00007ffff4a8f625 in raise () from /lib64/libc.so.6 #1 0x00007ffff4a90e05 in abort () from /lib64/libc.so.6 #2 0x00007ffff4acd537 in __libc_message () from /lib64/libc.so.6 #3 0x00007ffff4b5f527 in __fortify_fail () from /lib64/libc.so.6 #4 0x00007ffff4b5f4f0 in __stack_chk_fail () from /lib64/libc.so.6 #5 0x00007ffff72d0927 in virNetDevGetFeatures (ifname=<value optimized out>, out=<value optimized out>) at util/virnetdev.c:3200 #6 0x00007fffdddce47d in udevProcessNetworkInterface (device=0x7fffd4071f70, def=0x6) at node_device/node_device_udev.c:694 #7 udevGetDeviceDetails (device=0x7fffd4071f70, def=0x6) at node_device/node_device_udev.c:1272 #8 0x00007fffdddcf6c2 in udevAddOneDevice (device=0x7fffd4071f70) at node_device/node_device_udev.c:1394 #9 0x00007fffdddcff4e in udevProcessDeviceListEntry (privileged=<value optimized out>, callback=<value optimized out>, opaque=<value optimized out>) at node_device/node_device_udev.c:1433 #10 udevEnumerateDevices (privileged=<value optimized out>, callback=<value optimized out>, opaque=<value optimized out>) at node_device/node_device_udev.c:1463 #11 nodeStateInitialize (privileged=<value optimized out>, callback=<value optimized out>, opaque=<value optimized out>) at node_device/node_device_udev.c:1773 #12 0x00007ffff739b0a8 in virStateInitialize (privileged=true, callback=0x555555569070 <daemonInhibitCallback>, opaque=0x5555557f1db0) at libvirt.c:777 #13 0x0000555555569120 in daemonRunStateInit (opaque=<value optimized out>) at libvirtd.c:947 #14 0x00007ffff72fd975 in virThreadHelper (data=<value optimized out>) at util/virthread.c:206 #15 0x00007ffff5fba9d1 in start_thread () from /lib64/libpthread.so.0 #16 0x00007ffff4b458fd in clone () from /lib64/libc.so.6
In IRC, we tracked this down to this bit of code:
g_cmd.cmd = ETHTOOL_GFEATURES; g_cmd.size = GFEATURES_SIZE; if (virNetDevGFeatureAvailable(ifname, &g_cmd)) ignore_value(virBitmapSetBit(*out, VIR_NET_DEV_FEAT_TXUDPTNL));
GFEATURES_SIZE is currently defined as 2, but this value needs to be higher in order to support newer kernels. It looks like this code was added in ac3ed2085fcbeecaf5aa347c0b1bffaf94fff293
ethtool calculates this value based on the number of supported features: http://lxr.free-electrons.com/source/net/core/ethtool.c#L55
I don't know enough about this to properly fix this, but raising GFEATURES_SIZE to 3 has fixed this issue for me (though, this will obviously need to go higher as more features get added)
This crash was occurring on a CentOS 6 system, running a the ELRepo kernel- ml kernel. The stock CentOS 6 kernel (2.6.32) does not appear to have sufficient features available to trigger this.
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On 08/06/2015 05:18 AM, Moshe Levi wrote:
Hi Brian,
I added a new function to calculate the gfeature size that is supported by the kernel. I tested on my setup and it return 2 for me, can you check that it returns 3 in your case.
That is useful, but doesn't fix the memory corruption problem. ethtool_gfeatures is defined like this: struct ethtool_gfeatures { __u32 cmd; __u32 size; struct ethtool_get_features_block features[0]; }; (so features is the *start* of an array of ethtool_get_features_block, but it has 0 elements) and we define our instance of it as a local with no extra space at the end for the features array: struct ethtool_gfeatures g_cmd = { 0 }; Then we call the ioctl telling it that the features array has 2 elements. So the kernel thinks that it can write two elements worth of features at the end, which ends up writing over whatever other local is next on the stack (for 2 * sizeof(ethtool_get_features_block) = 32 bytes). It will do this whether we tell it there are 2 elements, 3 elements, or 100 elements, because we haven't reserved the memory for those elements. To fix it, g_cmd needs to be allocated as a char* with sizeof(ethtool_gfeatures) + (2 * sizeof(ethtool_get_gfeatures_block)) elements, then freed on exit from the function (I actually don't see why g_cmd is defined in virNetDevGetFeatures() but only used in virNetDevGFeatureAcailable(), but that's a style issue rather than something causing a crash, so it's less important). I was hoping to make a patch for this last night, but jet lag caught up with me before I could do it. I'm still planning to when I get back from some errands in an hour or two, in case another patch for it hasn't been sent by then.
Thanks, Moshe Levi.
-----Original Message----- From: libvir-list-bounces@redhat.com [mailto:libvir-list- bounces@redhat.com] On Behalf Of Brian Rak Sent: Wednesday, August 05, 2015 7:09 PM To: libvir-list@redhat.com Subject: [libvirt] 'stack smashing detected' in 1.2.18 (caused by virNetDevGFeatureAvailable)
I recently compiled 1.2.18 to start testing with it, and was getting this error on startup:
*** stack smashing detected ***: libvirtd terminated ======= Backtrace: ========= /lib64/libc.so.6(__fortify_fail+0x37)[0x7fe1ac631527] /lib64/libc.so.6(__fortify_fail+0x0)[0x7fe1ac6314f0] //lib/libvirt.so.0(+0xa7927)[0x7fe1aeda2927] //lib/libvirt/connection- driver/libvirt_driver_nodedev.so(+0x947d)[0x7fe1958a047d] //lib/libvirt/connection- driver/libvirt_driver_nodedev.so(+0xa6c2)[0x7fe1958a16c2] //lib/libvirt/connection- driver/libvirt_driver_nodedev.so(+0xaf4e)[0x7fe1958a1f4e] //lib/libvirt.so.0(virStateInitialize+0xb8)[0x7fe1aee6d0a8] libvirtd(+0x15120)[0x7fe1afae6120] //lib/libvirt.so.0(+0xd4975)[0x7fe1aedcf975] /lib64/libpthread.so.0(+0x30316079d1)[0x7fe1ada8c9d1] /lib64/libc.so.6(clone+0x6d)[0x7fe1ac6178fd]
(gdb) bt #0 0x00007ffff4a8f625 in raise () from /lib64/libc.so.6 #1 0x00007ffff4a90e05 in abort () from /lib64/libc.so.6 #2 0x00007ffff4acd537 in __libc_message () from /lib64/libc.so.6 #3 0x00007ffff4b5f527 in __fortify_fail () from /lib64/libc.so.6 #4 0x00007ffff4b5f4f0 in __stack_chk_fail () from /lib64/libc.so.6 #5 0x00007ffff72d0927 in virNetDevGetFeatures (ifname=<value optimized out>, out=<value optimized out>) at util/virnetdev.c:3200 #6 0x00007fffdddce47d in udevProcessNetworkInterface (device=0x7fffd4071f70, def=0x6) at node_device/node_device_udev.c:694 #7 udevGetDeviceDetails (device=0x7fffd4071f70, def=0x6) at node_device/node_device_udev.c:1272 #8 0x00007fffdddcf6c2 in udevAddOneDevice (device=0x7fffd4071f70) at node_device/node_device_udev.c:1394 #9 0x00007fffdddcff4e in udevProcessDeviceListEntry (privileged=<value optimized out>, callback=<value optimized out>, opaque=<value optimized out>) at node_device/node_device_udev.c:1433 #10 udevEnumerateDevices (privileged=<value optimized out>, callback=<value optimized out>, opaque=<value optimized out>) at node_device/node_device_udev.c:1463 #11 nodeStateInitialize (privileged=<value optimized out>, callback=<value optimized out>, opaque=<value optimized out>) at node_device/node_device_udev.c:1773 #12 0x00007ffff739b0a8 in virStateInitialize (privileged=true, callback=0x555555569070 <daemonInhibitCallback>, opaque=0x5555557f1db0) at libvirt.c:777 #13 0x0000555555569120 in daemonRunStateInit (opaque=<value optimized out>) at libvirtd.c:947 #14 0x00007ffff72fd975 in virThreadHelper (data=<value optimized out>) at util/virthread.c:206 #15 0x00007ffff5fba9d1 in start_thread () from /lib64/libpthread.so.0 #16 0x00007ffff4b458fd in clone () from /lib64/libc.so.6
In IRC, we tracked this down to this bit of code:
g_cmd.cmd = ETHTOOL_GFEATURES; g_cmd.size = GFEATURES_SIZE; if (virNetDevGFeatureAvailable(ifname, &g_cmd)) ignore_value(virBitmapSetBit(*out, VIR_NET_DEV_FEAT_TXUDPTNL));
GFEATURES_SIZE is currently defined as 2, but this value needs to be higher in order to support newer kernels. It looks like this code was added in ac3ed2085fcbeecaf5aa347c0b1bffaf94fff293
ethtool calculates this value based on the number of supported features: http://lxr.free-electrons.com/source/net/core/ethtool.c#L55
I don't know enough about this to properly fix this, but raising GFEATURES_SIZE to 3 has fixed this issue for me (though, this will obviously need to go higher as more features get added)
This crash was occurring on a CentOS 6 system, running a the ELRepo kernel- ml kernel. The stock CentOS 6 kernel (2.6.32) does not appear to have sufficient features available to trigger this.
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
participants (3)
-
Brian Rak
-
Laine Stump
-
Moshe Levi