On 08/11/2015 07:41 AM, Ján Tomko wrote:
[reducing the cc-list]
On Tue, Aug 11, 2015 at 02:47:09AM -0400, Laine Stump wrote:
> This started out as a fix for a crash reported in IRC and on libvir-list:
>
>
https://www.redhat.com/archives/libvir-list/2015-August/msg00162.html
>
> but as I examined the existing code I found so many small nits to pick
> in surrounding functions that I decided to fix them all in this patch.
>
> The most important fix here is that virNetDevGetFeatures was creating
> a struct ethtool_gfeatures object as a local on the stack and,
> although the struct is defined with 0 elements in its features array,
> we were telling the ethtool ioctl that we had made space for 2
> elements. This led to a crash, as outlined in the email above. The fix
> for this is to allocate the memory for the ethtool_gfeatures object
> using VIR_ALLOC_VAR, adding on GFEATURES_SIZE elements of type
> ethtool_get_features_block
>
> Beyond that crash fixer, the following fixups were made to the
> hierarchy of functions between virNetDevGetFeatures() and
> virNetDevSendEthtoolIoctl():
This looks like an enumeration of things that should've been separate :)
Possibly, but it would increase the bureaucratic overhead by an order of
magnitude, and many of these things are too inconsequential to bother
changing if they are in a patch by themselves.
> * macros used to examine the gfeatures bits were renamed from
> FEATURE_* to GFEATURE_*
>
> virNetDevSentEthtoolIoctl():
>
> * no need to initialize sock to -1, since it is always set at the top
> of the function.
No functional changes here.
> * remove VIR_DEBUG log (and subsequent *skipping* of error log!) when
> errno is EPERM, EINVAL or EOPNOTSUPP. If one of those errors were
> ever encountered, this would have been *very* problematic, as it
> would have led to one of those situations where virsh reports "an
> error was encountered but the cause is unknown" (or whatever the
> message is when we have an error but no log message).
>
This does not seem problematic, since we do not treat -1 as a failure.
Yeah, I hadn't thought of it that way, although this makes it a problem
that we are ever logging the errors. I a virReportError logging function
is called, we should be failing, and if we're not failing, then we
shouldn't be logging an error.
I'll rework the stuff to make the lower level function never log errors,
and change the high level functions back to ignoring the errors
> * always call VIR_FORCE_CLOSE(sock) since we know that sock is either
> a valid fd, or else -1 (which VIR_FORCE_CLOSE() will skip).
>
No fucntional change there either.
> virNetDevFeatureAvailable()
>
> * simplify it - no need for ret.
>
> * follow libvirt convention of checking for "bobLobLaw(lawblog) < 0"
> instead of "!bobLobLaw(lawblog)".
>
> virNetDevGFeatureAvailable()
>
> * eliminate this function, as it was ill-conceived (it really was only
> checking for one gfeature (TX_UDP_TNL), not *any* gfeature.
>
> virNetDevGetFeatures()
>
> * move all data tables (struct elem) out of the function so that they
> will be statics instead of taking up space on the stack.
>
> * remove pointless/incorrect initialization of i = -1.
>
> * remove unnecessary static initialization of struct ethtool_value cmd
>
> * g_cmd is now a pointer instead of automatic
>
> * use libvirt convention of checking return from
> virNetDevFeatureAvailable() < 0, instead of
> "!virNetDevFeatureAvailable()", and immediately return to caller
> with an error when virNetDevFeatureAvailable() returns an error
> (previously it was ignored).
>
This is the change that makes the lack of error reporting problematic.
> * remove superfluous size_t j, and just re-use i instead.
>
Another style cleanup that could be separated.
But where do you draw the line? If each one of these was put in a
separate patch, there would be 8 or 10 patches here.
But anyway, I cave - I'll separate out the crash fix from all the other
"refactor" stuff and resubmit.
> * runtime allocation/free of proper size object for ethtoolfeatures as
> described above.
>
> * directly call virNetDevSentEthtoolIoctl() instead of now defunct
> virNetDevGFeatureAvailable().
>
> ---
>
> V2: I had been looking for something like VIR_ALLOC_VAR() when writing
> the patch originally, but somehow missed it, and did an ugly hack with
> VIR_ALLOC_N and a union. In this version I clean that up and use
> VIR_ALLOC_VAR() instead.
>
> NB: This patch does *not* attempt to determine the proper number of
> elements for the gfeature array at runtime, as proposed in this patch:
>
>
https://www.redhat.com/archives/libvir-list/2015-August/msg00263.html
>
> since that wasn't the cause of the crash. I'll leave it up to Moshe to
> repost that patch rebased around this one (or whatever ends up being
> pushed) if he thinks there is value to it.
>
> Also - as I mentioned in earlier mail in response to the crash, I
> noticed when looking into the gfeatures ethtool code that it looks to
> me like TX_UDP_TNL should actually be 26 rather than 25, but I may be
> missing something. Moshe - can you either confirm or deny that? Where
> did you get the value 25 from?
> src/util/virnetdev.c | 177 +++++++++++++++++++++++----------------------------
> 1 file changed, 79 insertions(+), 98 deletions(-)
>
> diff --git a/src/util/virnetdev.c b/src/util/virnetdev.c
> index 1e20789..05fbff5 100644
> --- a/src/util/virnetdev.c
> +++ b/src/util/virnetdev.c
> @@ -91,10 +91,10 @@ VIR_LOG_INIT("util.netdev");
> #if HAVE_DECL_ETHTOOL_GFEATURES
> # define TX_UDP_TNL 25
> # define GFEATURES_SIZE 2
Do we need to (re)define these? I'd expect these constants to be present
in some header file.
Yeah, that's why I've been asking about their origin in all the other
threads. I greatly dislike pulling magic constants out of thin air.
Unfortunately I've so far only seen something related-looking defined in
a kernel header (and that value is apparently wrong, according to
Moshe). Here's what I saw in the kernel:
http://lxr.free-electrons.com/source/net/core/ethtool.c#L87
[NETIF_F_GSO_UDP_TUNNEL_BIT] = /"tx-udp_tnl-segmentation"/,
then in
http://lxr.free-electrons.com/source/include/linux/netdev_features.h
NETIF_F_GSO_UDP_TUNNEL_BIT is defined in an enum, and when I compile
that enum into a program and print out the value of
NETIF_F_GSO_UDP_TUNNEL_BIT, I get "26". On the other hand, Moshe has
shown in other email where he got the value 25 from in the ethtool
commandline utility's source.