[libvirt] [PATCH] util: fix crash when starting macvtap interfaces

This patch resolves https://bugzilla.redhat.com/show_bug.cgi?id=815270 The function virNetDevMacVLanVPortProfileRegisterCallback() takes an arg "virtPortProfile", and was checking it for non-NULL before using it. However, the prototype for virNetDevMacVLanPortProfileRegisterCallback had marked that arg with ATTRIBUTE_NONNULL(). Contrary to what one may think, ATTRIBUTE_NONNULL() does not provide any guarantee that an arg marked as such really is always non-null; the only effect to the code generated by gcc, is that gcc *assumes* it is non-NULL; this results in, for example, the check for a non-NULL value being optimized out. (Unfortunately, this code removal only occurs when optimization is enabled, and I am in the habit of doing local builds with optimization off to ease debugging, so the bug didn't show up in my earlier local testing). In general, virPortProfile might always be NULL, so it shouldn't be marked as ATTRIBUTE_NONNULL. One other function prototype made this same error, so this patch fixes it as well. --- src/util/virnetdevmacvlan.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/util/virnetdevmacvlan.h b/src/util/virnetdevmacvlan.h index 2299f1d..07d54e2 100644 --- a/src/util/virnetdevmacvlan.h +++ b/src/util/virnetdevmacvlan.h @@ -82,7 +82,7 @@ int virNetDevMacVLanRestartWithVPortProfile(const char *cr_ifname, virNetDevVPortProfilePtr virtPortProfile, enum virNetDevVPortProfileOp vmOp) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3) - ATTRIBUTE_NONNULL(4) ATTRIBUTE_NONNULL(5) ATTRIBUTE_RETURN_CHECK; + ATTRIBUTE_NONNULL(4) ATTRIBUTE_RETURN_CHECK; int virNetDevMacVLanVPortProfileRegisterCallback(const char *ifname, const unsigned char *macaddress , @@ -91,5 +91,5 @@ int virNetDevMacVLanVPortProfileRegisterCallback(const char *ifname, virNetDevVPortProfilePtr virtPortProfile, enum virNetDevVPortProfileOp vmOp) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3) -ATTRIBUTE_NONNULL(4) ATTRIBUTE_NONNULL(5) ATTRIBUTE_RETURN_CHECK; +ATTRIBUTE_NONNULL(4) ATTRIBUTE_RETURN_CHECK; #endif /* __UTIL_MACVTAP_H__ */ -- 1.7.10

On 04/25/2012 02:01 PM, Laine Stump wrote:
This patch resolves https://bugzilla.redhat.com/show_bug.cgi?id=815270
The function virNetDevMacVLanVPortProfileRegisterCallback() takes an arg "virtPortProfile", and was checking it for non-NULL before using it. However, the prototype for virNetDevMacVLanPortProfileRegisterCallback had marked that arg with ATTRIBUTE_NONNULL(). Contrary to what one may think, ATTRIBUTE_NONNULL() does not provide any guarantee that an arg marked as such really is always non-null; the only effect to the code generated by gcc, is that gcc *assumes* it is non-NULL; this results in, for example, the check for a non-NULL value being optimized out.
(Unfortunately, this code removal only occurs when optimization is enabled, and I am in the habit of doing local builds with optimization off to ease debugging, so the bug didn't show up in my earlier local testing).
In general, virPortProfile might always be NULL, so it shouldn't be marked as ATTRIBUTE_NONNULL. One other function prototype made this same error, so this patch fixes it as well.
Might be worth linking to http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17308
--- src/util/virnetdevmacvlan.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
ACK. What an insidious bug. As Laine and I discussed on IRC, I'm half wondering if we should just do: #ifdef STATIC_ANALYSIS && /* attributes supported */ # define ATTRIBUTE_NONNULL(n) __attribute__((__nonnull__(n))) #else # define ATTRIBUTE_NONNULL(n) /* empty, due to gcc lameness */ #endif so that our code will be pessimized under normal compiles, but _at least_ places where we have bugs with improper use of the attribute won't cause gcc to miscompile things; but still let us get NULL checking when running clang or Coverity. I also wonder if this has been detected by Coverity (checking a nonnull parameter for NULL is dead code, which Coverity does tend to flag), and we just haven't been following Coverity closely enough to notice. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 04/26/2012 04:46 AM, Eric Blake wrote:
On 04/25/2012 02:01 PM, Laine Stump wrote:
This patch resolves https://bugzilla.redhat.com/show_bug.cgi?id=815270
The function virNetDevMacVLanVPortProfileRegisterCallback() takes an arg "virtPortProfile", and was checking it for non-NULL before using it. However, the prototype for virNetDevMacVLanPortProfileRegisterCallback had marked that arg with ATTRIBUTE_NONNULL(). Contrary to what one may think, ATTRIBUTE_NONNULL() does not provide any guarantee that an arg marked as such really is always non-null; the only effect to the code generated by gcc, is that gcc *assumes* it is non-NULL; this results in, for example, the check for a non-NULL value being optimized out.
(Unfortunately, this code removal only occurs when optimization is enabled, and I am in the habit of doing local builds with optimization off to ease debugging, so the bug didn't show up in my earlier local testing).
In general, virPortProfile might always be NULL, so it shouldn't be marked as ATTRIBUTE_NONNULL. One other function prototype made this same error, so this patch fixes it as well. Might be worth linking to http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17308
--- src/util/virnetdevmacvlan.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) ACK. What an insidious bug.
As Laine and I discussed on IRC, I'm half wondering if we should just do:
#ifdef STATIC_ANALYSIS&& /* attributes supported */ # define ATTRIBUTE_NONNULL(n) __attribute__((__nonnull__(n))) #else # define ATTRIBUTE_NONNULL(n) /* empty, due to gcc lameness */ #endif
so that our code will be pessimized under normal compiles, but _at least_ places where we have bugs with improper use of the attribute won't cause gcc to miscompile things; but still let us get NULL checking when running clang or Coverity.
I also wonder if this has been detected by Coverity (checking a nonnull parameter for NULL is dead code, which Coverity does tend to flag), and we just haven't been following Coverity closely enough to notice. Eric, I ran Coverity on current commit 'f78024b util: fix crash when starting macvtap interfaces', Coverity hasn't complain this issue, although I also enabled '--security' checkers in Coverity.
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On 04/26/2012 02:12 AM, Alex Jia wrote:
On 04/26/2012 04:46 AM, Eric Blake wrote:
On 04/25/2012 02:01 PM, Laine Stump wrote:
This patch resolves https://bugzilla.redhat.com/show_bug.cgi?id=815270
The function virNetDevMacVLanVPortProfileRegisterCallback() takes an arg "virtPortProfile", and was checking it for non-NULL before using it. However, the prototype for virNetDevMacVLanPortProfileRegisterCallback had marked that arg with ATTRIBUTE_NONNULL(). Contrary to what one may think, ATTRIBUTE_NONNULL() does not provide any guarantee that an arg marked as such really is always non-null; the only effect to the code generated by gcc, is that gcc *assumes* it is non-NULL; this results in, for example, the check for a non-NULL value being optimized out.
(Unfortunately, this code removal only occurs when optimization is enabled, and I am in the habit of doing local builds with optimization off to ease debugging, so the bug didn't show up in my earlier local testing).
In general, virPortProfile might always be NULL, so it shouldn't be marked as ATTRIBUTE_NONNULL. One other function prototype made this same error, so this patch fixes it as well. Might be worth linking to http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17308
--- src/util/virnetdevmacvlan.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) ACK. What an insidious bug.
As Laine and I discussed on IRC, I'm half wondering if we should just do:
#ifdef STATIC_ANALYSIS && /* attributes supported */ # define ATTRIBUTE_NONNULL(n) __attribute__((__nonnull__(n))) #else # define ATTRIBUTE_NONNULL(n) /* empty, due to gcc lameness */ #endif
so that our code will be pessimized under normal compiles, but _at least_ places where we have bugs with improper use of the attribute won't cause gcc to miscompile things; but still let us get NULL checking when running clang or Coverity.
I also wonder if this has been detected by Coverity (checking a nonnull parameter for NULL is dead code, which Coverity does tend to flag), and we just haven't been following Coverity closely enough to notice. Eric, I ran Coverity on current commit 'f78024b util: fix crash when starting macvtap interfaces', Coverity hasn't complain this issue, although I also enabled '--security' checkers in Coverity.
The interesting run would be *before* this commit.

On 04/26/2012 02:57 PM, Laine Stump wrote:
On 04/26/2012 04:46 AM, Eric Blake wrote:
On 04/25/2012 02:01 PM, Laine Stump wrote:
This patch resolves https://bugzilla.redhat.com/show_bug.cgi?id=815270
The function virNetDevMacVLanVPortProfileRegisterCallback() takes an arg "virtPortProfile", and was checking it for non-NULL before using it. However, the prototype for virNetDevMacVLanPortProfileRegisterCallback had marked that arg with ATTRIBUTE_NONNULL(). Contrary to what one may think, ATTRIBUTE_NONNULL() does not provide any guarantee that an arg marked as such really is always non-null; the only effect to the code generated by gcc, is that gcc *assumes* it is non-NULL; this results in, for example, the check for a non-NULL value being optimized out.
(Unfortunately, this code removal only occurs when optimization is enabled, and I am in the habit of doing local builds with optimization off to ease debugging, so the bug didn't show up in my earlier local testing).
In general, virPortProfile might always be NULL, so it shouldn't be marked as ATTRIBUTE_NONNULL. One other function prototype made this same error, so this patch fixes it as well. Might be worth linking to http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17308
--- src/util/virnetdevmacvlan.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) ACK. What an insidious bug.
As Laine and I discussed on IRC, I'm half wondering if we should just do:
#ifdef STATIC_ANALYSIS&& /* attributes supported */ # define ATTRIBUTE_NONNULL(n) __attribute__((__nonnull__(n))) #else # define ATTRIBUTE_NONNULL(n) /* empty, due to gcc lameness */ #endif
so that our code will be pessimized under normal compiles, but _at least_ places where we have bugs with improper use of the attribute won't cause gcc to miscompile things; but still let us get NULL checking when running clang or Coverity.
I also wonder if this has been detected by Coverity (checking a nonnull parameter for NULL is dead code, which Coverity does tend to flag), and we just haven't been following Coverity closely enough to notice. Eric, I ran Coverity on current commit 'f78024b util: fix crash when starting macvtap interfaces', Coverity hasn't complain this issue, although I also enabled '--security' checkers in Coverity. The interesting run would be *before* this commit. I reran coverity on commit 'bae1312 build: fix bootstrap on RHEL', however, Coverity hasn't also complain the issue, for details, please refer to attachment. --
On 04/26/2012 02:12 AM, Alex Jia wrote: libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On 04/26/2012 01:44 AM, Alex Jia wrote:
As Laine and I discussed on IRC, I'm half wondering if we should just do:
#ifdef STATIC_ANALYSIS&& /* attributes supported */ # define ATTRIBUTE_NONNULL(n) __attribute__((__nonnull__(n))) #else # define ATTRIBUTE_NONNULL(n) /* empty, due to gcc lameness */ #endif
so that our code will be pessimized under normal compiles, but _at least_ places where we have bugs with improper use of the attribute won't cause gcc to miscompile things; but still let us get NULL checking when running clang or Coverity.
I also wonder if this has been detected by Coverity (checking a nonnull parameter for NULL is dead code, which Coverity does tend to flag), and we just haven't been following Coverity closely enough to notice. Eric, I ran Coverity on current commit 'f78024b util: fix crash when starting macvtap interfaces', Coverity hasn't complain this issue, although I also enabled '--security' checkers in Coverity. The interesting run would be *before* this commit. I reran coverity on commit 'bae1312 build: fix bootstrap on RHEL', however, Coverity hasn't also complain the issue, for details, please refer to attachment.
Hmm - maybe it's worth a bug report to the Coverity folks, as that would be a very nice static check to add. How does clang fare? Meanwhile, it looks like we've got a lot of cleanup to do - there are some real bugs in that Coverity report. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

Hello Eric, Unfortunately, Clang hasn't also complain the issue like Coverity on commit 'bae1312 build: fix bootstrap on RHEL'. Regards, Alex ----- Original Message ----- From: "Eric Blake" <eblake@redhat.com> To: "Alex Jia" <ajia@redhat.com> Cc: "Laine Stump" <laine@laine.org>, libvir-list@redhat.com Sent: Thursday, April 26, 2012 8:55:18 PM Subject: Re: [libvirt] [PATCH] util: fix crash when starting macvtap interfaces On 04/26/2012 01:44 AM, Alex Jia wrote:
As Laine and I discussed on IRC, I'm half wondering if we should just do:
#ifdef STATIC_ANALYSIS&& /* attributes supported */ # define ATTRIBUTE_NONNULL(n) __attribute__((__nonnull__(n))) #else # define ATTRIBUTE_NONNULL(n) /* empty, due to gcc lameness */ #endif
so that our code will be pessimized under normal compiles, but _at least_ places where we have bugs with improper use of the attribute won't cause gcc to miscompile things; but still let us get NULL checking when running clang or Coverity.
I also wonder if this has been detected by Coverity (checking a nonnull parameter for NULL is dead code, which Coverity does tend to flag), and we just haven't been following Coverity closely enough to notice. Eric, I ran Coverity on current commit 'f78024b util: fix crash when starting macvtap interfaces', Coverity hasn't complain this issue, although I also enabled '--security' checkers in Coverity. The interesting run would be *before* this commit. I reran coverity on commit 'bae1312 build: fix bootstrap on RHEL', however, Coverity hasn't also complain the issue, for details, please refer to attachment.
Hmm - maybe it's worth a bug report to the Coverity folks, as that would be a very nice static check to add. How does clang fare? Meanwhile, it looks like we've got a lot of cleanup to do - there are some real bugs in that Coverity report. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 04/25/2012 04:46 PM, Eric Blake wrote:
On 04/25/2012 02:01 PM, Laine Stump wrote:
This patch resolves https://bugzilla.redhat.com/show_bug.cgi?id=815270
The function virNetDevMacVLanVPortProfileRegisterCallback() takes an arg "virtPortProfile", and was checking it for non-NULL before using it. However, the prototype for virNetDevMacVLanPortProfileRegisterCallback had marked that arg with ATTRIBUTE_NONNULL(). Contrary to what one may think, ATTRIBUTE_NONNULL() does not provide any guarantee that an arg marked as such really is always non-null; the only effect to the code generated by gcc, is that gcc *assumes* it is non-NULL; this results in, for example, the check for a non-NULL value being optimized out.
(Unfortunately, this code removal only occurs when optimization is enabled, and I am in the habit of doing local builds with optimization off to ease debugging, so the bug didn't show up in my earlier local testing).
In general, virPortProfile might always be NULL, so it shouldn't be marked as ATTRIBUTE_NONNULL. One other function prototype made this same error, so this patch fixes it as well. Might be worth linking to http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17308
Oops. I pushed before I noticed this comment.
--- src/util/virnetdevmacvlan.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) ACK. What an insidious bug.
As Laine and I discussed on IRC, I'm half wondering if we should just do:
#ifdef STATIC_ANALYSIS && /* attributes supported */ # define ATTRIBUTE_NONNULL(n) __attribute__((__nonnull__(n))) #else # define ATTRIBUTE_NONNULL(n) /* empty, due to gcc lameness */ #endif
so that our code will be pessimized under normal compiles, but _at least_ places where we have bugs with improper use of the attribute won't cause gcc to miscompile things; but still let us get NULL checking when running clang or Coverity.
Or the patch that will be in the next reply to your mail? (STATIC_ANALYSIS is always defined, but could be 0 or 1)
I also wonder if this has been detected by Coverity (checking a nonnull parameter for NULL is dead code, which Coverity does tend to flag), and we just haven't been following Coverity closely enough to notice.
It's a fairly recent change, so very likely nobody has run Coverity against it yet.

The ATTRIBUTE_NONNULL(m) macro normally resolves to the gcc builtin __attribute__((__nonnull__(m))). The effect of this in gcc is unfortunately only to make gcc believe that "m" can never possibly be NULL, *not* to add in any checks to guarantee that it isn't ever NULL (i.e. it is an optimization aid, *not* something to verify code correctness.) - see the following gcc bug report for more details: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17308 Static source analyzers such as clang and coverity apparently can use ATTRIBUTE_NONNULL(), though, to detect dead code (in the case that the arg really is guaranteed non-NULL), as well as situations where an obviously NULL arg is given to the function. https://bugzilla.redhat.com/show_bug.cgi?id=815270 is a good example of a bug caused by erroneous application of ATTRIBUTE_NONNULL(). Several people spent a long time staring at this code and not finding the problem, because the problem wasn't in the function itself, but in the prototype that specified ATTRIBUTE_NONNULL() for an arg that actually *wasn't* always non-NULL, and caused a segv when dereferenced (even though the code that dereferenced the pointer was inside an if() that checked for a NULL pointer, that code was optimized out by gcc). There may be some very small gain to be had from the optimizations that can be inferred from ATTRIBUTE_NONNULL(), but it seems safer to err on the side of generating code that behaves as expected, while turning on the attribute for static analyzers. (dissenting opinions welcome :-) --- src/internal.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/internal.h b/src/internal.h index ef81cda..83f468d 100644 --- a/src/internal.h +++ b/src/internal.h @@ -182,7 +182,7 @@ # endif # ifndef ATTRIBUTE_NONNULL -# if __GNUC_PREREQ (3, 3) +# if __GNUC_PREREQ (3, 3) && STATIC_ANALYSIS # define ATTRIBUTE_NONNULL(m) __attribute__((__nonnull__(m))) # else # define ATTRIBUTE_NONNULL(m) -- 1.7.10

On 04/26/2012 12:56 AM, Laine Stump wrote:
The ATTRIBUTE_NONNULL(m) macro normally resolves to the gcc builtin __attribute__((__nonnull__(m))). The effect of this in gcc is unfortunately only to make gcc believe that "m" can never possibly be NULL, *not* to add in any checks to guarantee that it isn't ever NULL (i.e. it is an optimization aid, *not* something to verify code correctness.) - see the following gcc bug report for more details:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17308
Static source analyzers such as clang and coverity apparently can use ATTRIBUTE_NONNULL(), though, to detect dead code (in the case that the arg really is guaranteed non-NULL), as well as situations where an obviously NULL arg is given to the function.
https://bugzilla.redhat.com/show_bug.cgi?id=815270 is a good example of a bug caused by erroneous application of ATTRIBUTE_NONNULL(). Several people spent a long time staring at this code and not finding the problem, because the problem wasn't in the function itself, but in the prototype that specified ATTRIBUTE_NONNULL() for an arg that actually *wasn't* always non-NULL, and caused a segv when dereferenced (even though the code that dereferenced the pointer was inside an if() that checked for a NULL pointer, that code was optimized out by gcc).
There may be some very small gain to be had from the optimizations that can be inferred from ATTRIBUTE_NONNULL(), but it seems safer to err on the side of generating code that behaves as expected, while turning on the attribute for static analyzers.
(dissenting opinions welcome :-)
None from me :)
--- src/internal.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/internal.h b/src/internal.h index ef81cda..83f468d 100644 --- a/src/internal.h +++ b/src/internal.h @@ -182,7 +182,7 @@ # endif
# ifndef ATTRIBUTE_NONNULL -# if __GNUC_PREREQ (3, 3) +# if __GNUC_PREREQ (3, 3) && STATIC_ANALYSIS # define ATTRIBUTE_NONNULL(m) __attribute__((__nonnull__(m)))
ACK. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 04/26/2012 08:57 AM, Eric Blake wrote:
On 04/26/2012 12:56 AM, Laine Stump wrote:
The ATTRIBUTE_NONNULL(m) macro normally resolves to the gcc builtin __attribute__((__nonnull__(m))). The effect of this in gcc is unfortunately only to make gcc believe that "m" can never possibly be NULL, *not* to add in any checks to guarantee that it isn't ever NULL (i.e. it is an optimization aid, *not* something to verify code correctness.) - see the following gcc bug report for more details:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17308
Static source analyzers such as clang and coverity apparently can use ATTRIBUTE_NONNULL(), though, to detect dead code (in the case that the arg really is guaranteed non-NULL), as well as situations where an obviously NULL arg is given to the function.
https://bugzilla.redhat.com/show_bug.cgi?id=815270 is a good example of a bug caused by erroneous application of ATTRIBUTE_NONNULL(). Several people spent a long time staring at this code and not finding the problem, because the problem wasn't in the function itself, but in the prototype that specified ATTRIBUTE_NONNULL() for an arg that actually *wasn't* always non-NULL, and caused a segv when dereferenced (even though the code that dereferenced the pointer was inside an if() that checked for a NULL pointer, that code was optimized out by gcc).
There may be some very small gain to be had from the optimizations that can be inferred from ATTRIBUTE_NONNULL(), but it seems safer to err on the side of generating code that behaves as expected, while turning on the attribute for static analyzers.
(dissenting opinions welcome :-) None from me :)
Since nobody said they thought it was a bad idea, I've pushed it.
participants (3)
-
Alex Jia
-
Eric Blake
-
Laine Stump