On 08/06/2015 05:18 AM, Moshe Levi wrote:
Hi Brian,

I added a new function to calculate  the gfeature size that is supported by the kernel.
I tested on my setup and it return 2 for me, can you check that it returns 3 in your case.

That is useful, but doesn't fix the memory corruption problem. ethtool_gfeatures is defined like this:


  struct ethtool_gfeatures {
      __u32    cmd;
      __u32    size;
      struct ethtool_get_features_block features[0];
  };

(so features is the *start* of an array of ethtool_get_features_block, but it has 0 elements)
and we define our instance of it as a local with no extra space at the end for the features array:

    struct ethtool_gfeatures g_cmd = { 0 };

Then we call the ioctl telling it that the features array has 2 elements. So the kernel thinks that it can write two elements worth of features at the end, which ends up writing over whatever other local is next on the stack (for 2 * sizeof(ethtool_get_features_block) = 32 bytes). It will do this whether we tell it there are 2 elements, 3 elements, or 100 elements, because we haven't reserved the memory for those elements.

To fix it, g_cmd needs to be allocated as a char* with sizeof(ethtool_gfeatures) + (2 * sizeof(ethtool_get_gfeatures_block)) elements, then freed on exit from the function (I actually don't see why g_cmd is defined in virNetDevGetFeatures() but only used in virNetDevGFeatureAcailable(), but that's a style issue rather than something causing a crash, so it's less important).

I was hoping to make a patch for this last night, but jet lag caught up with me before I could do it. I'm still planning to when I get back from some errands in an hour or two, in case another patch for it hasn't been sent by then.


Thanks,
	Moshe Levi. 

-----Original Message-----
From: libvir-list-bounces@redhat.com [mailto:libvir-list-
bounces@redhat.com] On Behalf Of Brian Rak
Sent: Wednesday, August 05, 2015 7:09 PM
To: libvir-list@redhat.com
Subject: [libvirt] 'stack smashing detected' in 1.2.18 (caused by
virNetDevGFeatureAvailable)

I recently compiled 1.2.18 to start testing with it, and was getting this error on
startup:

*** stack smashing detected ***: libvirtd terminated ======= Backtrace:
========= /lib64/libc.so.6(__fortify_fail+0x37)[0x7fe1ac631527]
/lib64/libc.so.6(__fortify_fail+0x0)[0x7fe1ac6314f0]
//lib/libvirt.so.0(+0xa7927)[0x7fe1aeda2927]
//lib/libvirt/connection-
driver/libvirt_driver_nodedev.so(+0x947d)[0x7fe1958a047d]
//lib/libvirt/connection-
driver/libvirt_driver_nodedev.so(+0xa6c2)[0x7fe1958a16c2]
//lib/libvirt/connection-
driver/libvirt_driver_nodedev.so(+0xaf4e)[0x7fe1958a1f4e]
//lib/libvirt.so.0(virStateInitialize+0xb8)[0x7fe1aee6d0a8]
libvirtd(+0x15120)[0x7fe1afae6120]
//lib/libvirt.so.0(+0xd4975)[0x7fe1aedcf975]
/lib64/libpthread.so.0(+0x30316079d1)[0x7fe1ada8c9d1]
/lib64/libc.so.6(clone+0x6d)[0x7fe1ac6178fd]

(gdb) bt
#0  0x00007ffff4a8f625 in raise () from /lib64/libc.so.6
#1  0x00007ffff4a90e05 in abort () from /lib64/libc.so.6
#2  0x00007ffff4acd537 in __libc_message () from /lib64/libc.so.6
#3  0x00007ffff4b5f527 in __fortify_fail () from /lib64/libc.so.6
#4  0x00007ffff4b5f4f0 in __stack_chk_fail () from /lib64/libc.so.6
#5  0x00007ffff72d0927 in virNetDevGetFeatures (ifname=<value optimized
out>, out=<value optimized out>) at util/virnetdev.c:3200
#6  0x00007fffdddce47d in udevProcessNetworkInterface
(device=0x7fffd4071f70, def=0x6) at node_device/node_device_udev.c:694
#7  udevGetDeviceDetails (device=0x7fffd4071f70, def=0x6) at
node_device/node_device_udev.c:1272
#8  0x00007fffdddcf6c2 in udevAddOneDevice (device=0x7fffd4071f70) at
node_device/node_device_udev.c:1394
#9  0x00007fffdddcff4e in udevProcessDeviceListEntry (privileged=<value
optimized out>, callback=<value optimized out>, opaque=<value optimized
out>)
     at node_device/node_device_udev.c:1433
#10 udevEnumerateDevices (privileged=<value optimized out>,
callback=<value optimized out>, opaque=<value optimized out>) at
node_device/node_device_udev.c:1463
#11 nodeStateInitialize (privileged=<value optimized out>, callback=<value
optimized out>, opaque=<value optimized out>) at
node_device/node_device_udev.c:1773
#12 0x00007ffff739b0a8 in virStateInitialize (privileged=true,
callback=0x555555569070 <daemonInhibitCallback>,
opaque=0x5555557f1db0) at libvirt.c:777
#13 0x0000555555569120 in daemonRunStateInit (opaque=<value optimized
out>) at libvirtd.c:947
#14 0x00007ffff72fd975 in virThreadHelper (data=<value optimized out>) at
util/virthread.c:206
#15 0x00007ffff5fba9d1 in start_thread () from /lib64/libpthread.so.0
#16 0x00007ffff4b458fd in clone () from /lib64/libc.so.6

In IRC, we tracked this down to this bit of code:

     g_cmd.cmd = ETHTOOL_GFEATURES;
     g_cmd.size = GFEATURES_SIZE;
     if (virNetDevGFeatureAvailable(ifname, &g_cmd))
         ignore_value(virBitmapSetBit(*out, VIR_NET_DEV_FEAT_TXUDPTNL));

GFEATURES_SIZE is currently defined as 2, but this value needs to be higher
in order to support newer kernels.  It looks like this code was added in
ac3ed2085fcbeecaf5aa347c0b1bffaf94fff293

ethtool calculates this value based on the number of supported features:
http://lxr.free-electrons.com/source/net/core/ethtool.c#L55

I don't know enough about this to properly fix this, but raising
GFEATURES_SIZE to 3 has fixed this issue for me (though, this will obviously
need to go higher as more features get added)

This crash was occurring on a CentOS 6 system, running a the ELRepo kernel-
ml kernel.  The stock CentOS 6 kernel (2.6.32) does not appear to have
sufficient features available to trigger this.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list