[PATCH 0/3] Fix ARP table parsing over netlink messages

Somehow it happened that some kernels (I noticed this with 6.10.0 and 6.10.2 on various machines) started sending NLMSG_DONE message (as they probably should've even before), but our check for it could've never worked and now `virsh domifaddr --source arp` sometimes fails with "wrong nlmsg len". So I set out on a quest to tame the netlink beast and though fierceful debugging and... Oh, I forgot, nobody reads cover letters. Never mind. Martin Kletzander (3): virarptable: Properly calculate rtattr length virarptable: Fix check for message length virarptable: End parsing earlier in case of NLMSG_DONE src/util/virarptable.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) -- 2.46.0

Use convenience macro from libnl3 which does almost the same thing we were doing, but also aligns the payload length. Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- src/util/virarptable.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/src/util/virarptable.c b/src/util/virarptable.c index 299dddd664ab..d8e41c5a8668 100644 --- a/src/util/virarptable.c +++ b/src/util/virarptable.c @@ -102,8 +102,7 @@ virArpTableGet(void) return table; VIR_WARNINGS_NO_CAST_ALIGN - parse_rtattr(tb, NDA_MAX, NDA_RTA(r), - nh->nlmsg_len - NLMSG_LENGTH(sizeof(*r))); + parse_rtattr(tb, NDA_MAX, NDA_RTA(r), NLMSG_PAYLOAD(nh, sizeof(*r))); VIR_WARNINGS_RESET if (tb[NDA_DST] == NULL || tb[NDA_LLADDR] == NULL) -- 2.46.0

On 8/16/24 8:45 AM, Martin Kletzander wrote:
Use convenience macro from libnl3
Actually the NLMSG_PAYLOAD() macro (which I had never used nor heard of before this, but just looked it up in the header files so I could be a know-it-all) is in the basic kernel netlink.h, not in libnl.
which does almost the same thing we were doing, but also aligns the payload length.
"pads out the payload length to a multiple of NLMSG_ALIGNTO (4) bytes" to be more long-winded :-)
Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- src/util/virarptable.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/src/util/virarptable.c b/src/util/virarptable.c index 299dddd664ab..d8e41c5a8668 100644 --- a/src/util/virarptable.c +++ b/src/util/virarptable.c @@ -102,8 +102,7 @@ virArpTableGet(void) return table;
VIR_WARNINGS_NO_CAST_ALIGN - parse_rtattr(tb, NDA_MAX, NDA_RTA(r), - nh->nlmsg_len - NLMSG_LENGTH(sizeof(*r))); + parse_rtattr(tb, NDA_MAX, NDA_RTA(r), NLMSG_PAYLOAD(nh, sizeof(*r))); VIR_WARNINGS_RESET
if (tb[NDA_DST] == NULL || tb[NDA_LLADDR] == NULL)
Reviewed-by: Laine Stump <laine@redhat.com>

The previous check was all wrong since it calculated the how long would the netlink message be if the netlink header was the payload and then subtracted that from the whole message length, a variable that was not used later in the code. This check can fail if there are no additional payloads, struct rtattr in particular, which we are parsing later, however the RTA_OK macro would've caught that anyway. Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- src/util/virarptable.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/util/virarptable.c b/src/util/virarptable.c index d8e41c5a8668..8e805fb35332 100644 --- a/src/util/virarptable.c +++ b/src/util/virarptable.c @@ -84,7 +84,7 @@ virArpTableGet(void) int len = nh->nlmsg_len; void *addr; - if ((len -= NLMSG_LENGTH(sizeof(*nh))) < 0) { + if (len < NLMSG_SPACE(sizeof(*r))) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("wrong nlmsg len")); goto cleanup; -- 2.46.0

On 8/16/24 8:45 AM, Martin Kletzander wrote:
The previous check was all wrong since it calculated the how long would the netlink message be if the netlink header was the payload and then subtracted that from the whole message length, a variable that was not used later in the code. This check can fail if there are no additional payloads, struct rtattr in particular, which we are parsing later, however the RTA_OK macro would've caught that anyway.
I've always thought that netlink was unnecessarily complicated, and the fact that this code existed for this long and nobody noticed it is just a symptom of that :-/
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Fixes: a176d67cdfaf5b8237a7e3a80d8be0e6bdf2d8fd
--- src/util/virarptable.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/util/virarptable.c b/src/util/virarptable.c index d8e41c5a8668..8e805fb35332 100644 --- a/src/util/virarptable.c +++ b/src/util/virarptable.c @@ -84,7 +84,7 @@ virArpTableGet(void) int len = nh->nlmsg_len; void *addr;
- if ((len -= NLMSG_LENGTH(sizeof(*nh))) < 0) { + if (len < NLMSG_SPACE(sizeof(*r))) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("wrong nlmsg len")); goto cleanup;
I prefer Ondrej Mosnáček's suggested change here: https://bugzilla.redhat.com/2302245#c7 - he eliminates "len" entirely and replaces it with nh->nlmsg_len. Reviewed-by: Laine Stump <laine@redhat.com> (if you get rid of len)

Check for the last multipart message right as the first thing. The presumption probably was that the last message might still contain a payload we want to parse. However that cannot be true since that would have to be a type RTM_NEWNEIGH. This was not caught because older kernels were note sending NLMSG_DONE and probably relied on the fact that the parsing just stops after all the messages are walked through, which the NLMSG_OK macro successfully did. Resolves: https://issues.redhat.com/browse/RHEL-52449 Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- So technically this still has some issues, maybe. I could not find if our usage of libnl3 makes it easier for us so that we do not have to check for NLMSG_{ERROR,OVERRUN,NOOP} or whether these checks should be here as well. If yes, then we should add them. And we have (some of) these checks elsewhere in the code, so "maybe". Another thing is that we could avoid such errors by using nl_socket_set_cb(), calling nl_recvmsgs_default() and then parsing only the valid messages in a callback. On top of that we could have an abstraction on top this to utilise in all the netlink dumps we do, ditching our current abstraction which was a bit hard for me to go through, to be honest. And of course there might be other places in our codebase that expect the same behaviour as this code did and we should fix 'em all. After all the debugging for this piece I did not even check for those, maybe if this gets in I'll have a long think about it. src/util/virarptable.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/util/virarptable.c b/src/util/virarptable.c index 8e805fb35332..604019c62a37 100644 --- a/src/util/virarptable.c +++ b/src/util/virarptable.c @@ -84,6 +84,9 @@ virArpTableGet(void) int len = nh->nlmsg_len; void *addr; + if (nh->nlmsg_type == NLMSG_DONE) + return table; + if (len < NLMSG_SPACE(sizeof(*r))) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("wrong nlmsg len")); @@ -98,9 +101,6 @@ virArpTableGet(void) (!(r->ndm_state == NUD_STALE || r->ndm_state == NUD_REACHABLE))) continue; - if (nh->nlmsg_type == NLMSG_DONE) - return table; - VIR_WARNINGS_NO_CAST_ALIGN parse_rtattr(tb, NDA_MAX, NDA_RTA(r), NLMSG_PAYLOAD(nh, sizeof(*r))); VIR_WARNINGS_RESET -- 2.46.0

On 8/16/24 8:45 AM, Martin Kletzander wrote:
Check for the last multipart message right as the first thing. The presumption probably was that the last message might still contain a payload we want to parse. However that cannot be true since that would have to be a type RTM_NEWNEIGH.
This was not caught because older kernels were note sending NLMSG_DONE and probably relied on the fact that the parsing just stops after all the messages are walked through, which the NLMSG_OK macro successfully did.
Also: Resolves: https://bugzilla.redhat.com/2302245 Fixes: a176d67cdfaf5b8237a7e3a80d8be0e6bdf2d8fd (Hmm, and BTW I had suggested that patch 2/3 should have that Fixes: line, but I guess it can't be *totally* fixed until this patch, since prior to that we're checking the values of a struct ndmsg before checking the message type (i.e. it might not be an RTM_NEWNEIGH))
Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- So technically this still has some issues, maybe.
I'm pretty sure there are other issues with our netlink code, but the effect of those issues is unknown, and unrelated to the current bug which your code fixes. So, IMO, "this" doesn't have some issues :-)
I could not find if our usage of libnl3 makes it easier for us so that we do not have to check for NLMSG_{ERROR,OVERRUN,NOOP} or whether these checks should be here as well
Well, virArpTableGet() calls virNetlinkGetNeighbor(), which allows responses where the *first* message in the response is NLMSG_NEWNEIGH or NLMSG_ERROR, and that calls virNetlinkTalk() which fails on any NLMSG_ERROR that isn't 0 (an NLMSG_ERRO with error==0 is just an ack packet). virNetlinkTalk() calls virNetLinkCommand(), which just calls nl_recv(), and as far as I can tell from the libnl API docs, there's no special handling there for any type of message.
If yes, then we should add them. And we have (some of) these checks elsewhere in the code, so "maybe".
Have you ever tried to assimilate the libnl documentation? (https://www.infradead.org/~tgr/libnl/doc/api/) I'll refer you to my comment further down marked with "[*]".
Another thing is that we could avoid such errors by using nl_socket_set_cb(), calling nl_recvmsgs_default() and then parsing only the valid messages in a callback.
This somehow reminds me of code that we had in netcf that I think needed to get a list of interfaces, and we used the nl_cache_* functions for this, which in the end (I thought) made the code even more complicated than if we'd just used lower level functions. There is some very useful stuff in libnl, but I think some times it complicates in the name of "simplifying". I guess I'm trying to say "Maybe, but don't get your hopes up".
On top of that we could have an abstraction on top this to utilise in all the netlink dumps we do, ditching our current abstraction which was a bit hard for me to go through, to be honest.
Everything about netlink from the very beginning has been "hard to go through", especially when mixed in with libnl. I think originally there were a couple different usages of netlink/libnl in libvirt authored by a couple different people, and then another usage or two came along with each new user trying to refactor existing code to work for them too, but without ever any underlying plan. Or something like that. The result is that (IMO) trying to understand some of our netlink code might do you more harm than good :-/. Not that I'm cynical or anything. (P.S. Ditching the current abstraction(s) and making something new that works better for everyone would be wonderful to see. As long as someone else did it. The dreadful side of that is that I may soon need to add yet another usage of netlink to the code, and that could mean that either I will a) go through the pain and time consuming frustration of doing said ditching/rewriting so that my new code can be nicer, or b) live with the embarrasment of adding yet another hacky house of netlink cards to the code :-/)
And of course there might be other places in our codebase that expect the same behaviour as this code did and we should fix 'em all.
One other problem I noticed during my short dip into our netlink code last week is that in all cases except getting VFINFO (for SRIOV devices), we only read a single response packet after we make a request, when any response could have multiple packets.
After all the debugging for this piece I did not even check for those, maybe if this gets in I'll have a long think about it.
[*]Good luck with that. Be sure the beer cooler next to your desk is well stocked.
src/util/virarptable.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/src/util/virarptable.c b/src/util/virarptable.c index 8e805fb35332..604019c62a37 100644 --- a/src/util/virarptable.c +++ b/src/util/virarptable.c @@ -84,6 +84,9 @@ virArpTableGet(void) int len = nh->nlmsg_len; void *addr;
+ if (nh->nlmsg_type == NLMSG_DONE) + return table;
I would prefer "break;" here rather than "return table;" just in case someone in the future adds something that requires some sort of cleanup code at the end of the function. This is fine too though. Reviewed-by: Laine Stump <laine@redhat.com>
+ if (len < NLMSG_SPACE(sizeof(*r))) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("wrong nlmsg len")); @@ -98,9 +101,6 @@ virArpTableGet(void) (!(r->ndm_state == NUD_STALE || r->ndm_state == NUD_REACHABLE))) continue;
- if (nh->nlmsg_type == NLMSG_DONE) - return table; - VIR_WARNINGS_NO_CAST_ALIGN parse_rtattr(tb, NDA_MAX, NDA_RTA(r), NLMSG_PAYLOAD(nh, sizeof(*r))); VIR_WARNINGS_RESET

On Sat, Aug 17, 2024 at 01:07:45AM -0400, Laine Stump wrote:
On 8/16/24 8:45 AM, Martin Kletzander wrote:
Check for the last multipart message right as the first thing. The presumption probably was that the last message might still contain a payload we want to parse. However that cannot be true since that would have to be a type RTM_NEWNEIGH.
This was not caught because older kernels were note sending NLMSG_DONE and probably relied on the fact that the parsing just stops after all the messages are walked through, which the NLMSG_OK macro successfully did.
Also:
Resolves: https://bugzilla.redhat.com/2302245 Fixes: a176d67cdfaf5b8237a7e3a80d8be0e6bdf2d8fd
(Hmm, and BTW I had suggested that patch 2/3 should have that Fixes: line, but I guess it can't be *totally* fixed until this patch, since prior to that we're checking the values of a struct ndmsg before checking the message type (i.e. it might not be an RTM_NEWNEIGH))
Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- So technically this still has some issues, maybe.
I'm pretty sure there are other issues with our netlink code, but the effect of those issues is unknown, and unrelated to the current bug which your code fixes. So, IMO, "this" doesn't have some issues :-)
I could not find if our usage of libnl3 makes it easier for us so that we do not have to check for NLMSG_{ERROR,OVERRUN,NOOP} or whether these checks should be here as well
Well, virArpTableGet() calls virNetlinkGetNeighbor(), which allows responses where the *first* message in the response is NLMSG_NEWNEIGH or NLMSG_ERROR, and that calls virNetlinkTalk() which fails on any NLMSG_ERROR that isn't 0 (an NLMSG_ERRO with error==0 is just an ack packet). virNetlinkTalk() calls virNetLinkCommand(), which just calls nl_recv(), and as far as I can tell from the libnl API docs, there's no special handling there for any type of message.
If yes, then we should add them. And we have (some of) these checks elsewhere in the code, so "maybe".
Have you ever tried to assimilate the libnl documentation? (https://www.infradead.org/~tgr/libnl/doc/api/) I'll refer you to my comment further down marked with "[*]".
I have, that's why it took me almost 2 full days to figure this out and dig myself out of the hellhole. It's not _bad_ documentation, it's definitely below average from what I remember dealing with. Yet I still had the libnl3 source code opened on the side to really understand what the documentation was trying to say.
Another thing is that we could avoid such errors by using nl_socket_set_cb(), calling nl_recvmsgs_default() and then parsing only the valid messages in a callback.
This somehow reminds me of code that we had in netcf that I think needed to get a list of interfaces, and we used the nl_cache_* functions for this, which in the end (I thought) made the code even more complicated than if we'd just used lower level functions. There is some very useful stuff in libnl, but I think some times it complicates in the name of "simplifying".
I guess I'm trying to say "Maybe, but don't get your hopes up".
It can't be worse than it is now, right? Anakin^WLaine: "" It can't be worse than it is now, right? I guess it's one of those "I can't know until I try!". That said I am not promising anything! I just started to fall down from the mount stupid of Dunning-Kruger effect when it comes down to libnl3. I'm sure it'll be fine!!! *distant crying noises*
On top of that we could have an abstraction on top this to utilise in all the netlink dumps we do, ditching our current abstraction which was a bit hard for me to go through, to be honest.
Everything about netlink from the very beginning has been "hard to go through", especially when mixed in with libnl. I think originally there were a couple different usages of netlink/libnl in libvirt authored by a couple different people, and then another usage or two came along with each new user trying to refactor existing code to work for them too, but without ever any underlying plan. Or something like that. The result is that (IMO) trying to understand some of our netlink code might do you more harm than good :-/. Not that I'm cynical or anything.
Too late, I should've known that at the beginning of last week.
(P.S. Ditching the current abstraction(s) and making something new that works better for everyone would be wonderful to see. As long as someone else did it. The dreadful side of that is that I may soon need to add yet another usage of netlink to the code, and that could mean that either I will a) go through the pain and time consuming frustration of doing said ditching/rewriting so that my new code can be nicer, or b) live with the embarrasment of adding yet another hacky house of netlink cards to the code :-/)
That "as long as someone else did it" is exactly how I feel about it after going through both libvirt and libnl code just to figure out few minor things. I mean look at this fix. It should not have taken me two days to figure it out. But if you already know about the fact you'll need to have new usage for netlink in our code than it might be the thing that pushes me over the edge and at least for one of us it will actually be done by "someone else". I'll keep you posted if whether I overcame the fear or not.
And of course there might be other places in our codebase that expect the same behaviour as this code did and we should fix 'em all.
One other problem I noticed during my short dip into our netlink code last week is that in all cases except getting VFINFO (for SRIOV devices), we only read a single response packet after we make a request, when any response could have multiple packets.
Again, depends if it actually is a multipart message, for which's flag we only check in virNetlinkGetErrorCode().
After all the debugging for this piece I did not even check for those, maybe if this gets in I'll have a long think about it.
[*]Good luck with that. Be sure the beer cooler next to your desk is well stocked.
Damn, I should buy one then, I guess. One of the big issues will be to write some tests for it if I want to do it properly. That's going to be a lot of beer... Anyway, all the commits have your suggestions worked in and are pushed now. Thanks.
src/util/virarptable.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/src/util/virarptable.c b/src/util/virarptable.c index 8e805fb35332..604019c62a37 100644 --- a/src/util/virarptable.c +++ b/src/util/virarptable.c @@ -84,6 +84,9 @@ virArpTableGet(void) int len = nh->nlmsg_len; void *addr;
+ if (nh->nlmsg_type == NLMSG_DONE) + return table;
I would prefer "break;" here rather than "return table;" just in case someone in the future adds something that requires some sort of cleanup code at the end of the function. This is fine too though.
Reviewed-by: Laine Stump <laine@redhat.com>
+ if (len < NLMSG_SPACE(sizeof(*r))) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("wrong nlmsg len")); @@ -98,9 +101,6 @@ virArpTableGet(void) (!(r->ndm_state == NUD_STALE || r->ndm_state == NUD_REACHABLE))) continue;
- if (nh->nlmsg_type == NLMSG_DONE) - return table; - VIR_WARNINGS_NO_CAST_ALIGN parse_rtattr(tb, NDA_MAX, NDA_RTA(r), NLMSG_PAYLOAD(nh, sizeof(*r))); VIR_WARNINGS_RESET

On 8/16/24 8:45 AM, Martin Kletzander wrote:
Somehow it happened that some kernels (I noticed this with 6.10.0 and 6.10.2 on various machines) started sending NLMSG_DONE message (as they probably should've even before),
This comment shows the kernel commit that caused it: https://bugzilla.redhat.com/2302245#c4 This bug also points out that we're only reading a single response message, which could theoretically lead to missing the tail of a very long arp table. But that's another battle for another day.
but our check for it could've never worked and now `virsh domifaddr --source arp` sometimes fails with "wrong nlmsg len". So I set out on a quest to tame the netlink beast and though fierceful debugging and... Oh, I forgot, nobody reads cover letters. Never mind.
Oh come on!!! Just when it was getting interesting! Where is the gyring and gimbling in the wabe?? The mome raths?? The mimsy borogroves???? (in case that makes no sense: https://www.poetryfoundation.org/poems/42916/jabberwocky - my 3rd or 4th favorite poem, after 2 or 3 by Dr. Seuss)
Martin Kletzander (3): virarptable: Properly calculate rtattr length virarptable: Fix check for message length virarptable: End parsing earlier in case of NLMSG_DONE
src/util/virarptable.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-)

On Fri, Aug 16, 2024 at 11:52:09PM -0400, Laine Stump wrote:
On 8/16/24 8:45 AM, Martin Kletzander wrote:
Somehow it happened that some kernels (I noticed this with 6.10.0 and 6.10.2 on various machines) started sending NLMSG_DONE message (as they probably should've even before),
This comment shows the kernel commit that caused it:
Oh, it would've been nice to know someone else found it as well =D My bad, I guess.
This bug also points out that we're only reading a single response message, which could theoretically lead to missing the tail of a very long arp table. But that's another battle for another day.
To be honest I did not try tracking down our usage/acceptance of NLM_F_MULTI. It's also one of the reasons I suggested using a tiny bit more of the high level abstraction from libnl3 where we can leave that multipart messages and bit of error handling to it. More in 3/3.
but our check for it could've never worked and now `virsh domifaddr --source arp` sometimes fails with "wrong nlmsg len". So I set out on a quest to tame the netlink beast and though fierceful debugging and... Oh, I forgot, nobody reads cover letters. Never mind.
Oh come on!!! Just when it was getting interesting! Where is the gyring and gimbling in the wabe?? The mome raths?? The mimsy borogroves????
Thank you for reading the commit message. You shall be known as Laine the reader from now on ;)
(in case that makes no sense: https://www.poetryfoundation.org/poems/42916/jabberwocky - my 3rd or 4th favorite poem, after 2 or 3 by Dr. Seuss)
Sounds good, I just need to learn the language first =D
Martin Kletzander (3): virarptable: Properly calculate rtattr length virarptable: Fix check for message length virarptable: End parsing earlier in case of NLMSG_DONE
src/util/virarptable.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-)

On 8/19/24 6:22 AM, Martin Kletzander wrote:
On Fri, Aug 16, 2024 at 11:52:09PM -0400, Laine Stump wrote:
(in case that makes no sense: https://www.poetryfoundation.org/poems/42916/jabberwocky - my 3rd or 4th favorite poem, after 2 or 3 by Dr. Seuss)
Sounds good, I just need to learn the language first =D
It may help to know that it was confusing enough to many native English speakers that a rumor started (and still persists) that Lewis Carroll was "high on psychadelic drugs" when he wrote both The Jabberwocky and Alice in Wonderland. According to the Lewis Carroll Society he wasn't, but it sure sounded like he was (he was just naturally a very wild and crazy thinking guy).
participants (2)
-
Laine Stump
-
Martin Kletzander