[libvirt-TCK] Disabling the vepa VSI test 300-vsitype.t

Hi list, so since the beginning of this week I've been poking at the last failure [1] in the nwfilter segment of the TCK suite. So, the errors come from libnl although I haven't been able to extract what the true underlying issue is since interface with ID '8' definitely exist on my system. A bit of background (you can either clone the repo or look at the Perl script attached), we're configuring the guest network interface as 'direct' with mode VEPA. IIUC, for proper VEPA support you need a compliant external switch which 1) I don't have 2) upstream CI planned to run in a nested env won't have either. The main issue lies in the test trying to set <virtualport> parameters on the interface. I've tried with regular network interfaces, vlan-tagged interfaces (as one of the other error messages complained about a missing vlan tag - which is something VEPA switches supposedly do on their own), and SR-IOV VFs with no luck. I'd be happy for any networking insights here, but given the setup which had clearly been tested with specialized HW I'd suggest simply disabling the test from the suite for upstream purposes - well, the correct approach would be to introduce a new config option indicating that specialized HW is necessary since currently the test case kind of abuses the config option assigning a virtual interface directly to the guest which in this case is a necessary condition, but not a sufficient one. However, with the Avocado<->TCK joined work happening, I'd rather not spent more time with Perl than necessary. [1] virNetDevVPortProfileOpSetLink:823 : error during virtual port configuration of ifindex 8: No such device or address virNetDevVPortProfileOpCommon:958 : internal error: sending of PortProfileRequest failed. Thanks, Erik

On 2/5/20 11:00 AM, Erik Skultety wrote:
Hi list, so since the beginning of this week I've been poking at the last failure [1] in the nwfilter segment of the TCK suite. So, the errors come from libnl although I haven't been able to extract what the true underlying issue is since interface with ID '8' definitely exist on my system.
A bit of background (you can either clone the repo or look at the Perl script attached), we're configuring the guest network interface as 'direct' with mode VEPA. IIUC, for proper VEPA support you need a compliant external switch which 1) I don't have 2) upstream CI planned to run in a nested env won't have either.
The main issue lies in the test trying to set <virtualport> parameters on the interface. I've tried with regular network interfaces, vlan-tagged interfaces (as one of the other error messages complained about a missing vlan tag - which is something VEPA switches supposedly do on their own), and SR-IOV VFs with no luck. I'd be happy for any networking insights here, but given the setup which had clearly been tested with specialized HW I'd suggest simply disabling the test from the suite for upstream purposes - well, the correct approach
I was involved in this a few years ago... without having the hardware myself. I am fine with you disabling the test case. The only other thing I'd be curious about is whether some changes were occurring in the past that broke this test case (suddenly). You may not know... I don't recall it having been broken, but cannot say for sure. Is there something sensing (now) that the right network setup is not available and therefore it behaves differently and things start failing? Regards, Stefan
would be to introduce a new config option indicating that specialized HW is necessary since currently the test case kind of abuses the config option assigning a virtual interface directly to the guest which in this case is a necessary condition, but not a sufficient one. However, with the Avocado<->TCK joined work happening, I'd rather not spent more time with Perl than necessary.
[1] virNetDevVPortProfileOpSetLink:823 : error during virtual port configuration of ifindex 8: No such device or address virNetDevVPortProfileOpCommon:958 : internal error: sending of PortProfileRequest failed.
Thanks, Erik

On Wed, Feb 05, 2020 at 01:47:24PM -0500, Stefan Berger wrote:
On 2/5/20 11:00 AM, Erik Skultety wrote:
Hi list, so since the beginning of this week I've been poking at the last failure [1] in the nwfilter segment of the TCK suite. So, the errors come from libnl although I haven't been able to extract what the true underlying issue is since interface with ID '8' definitely exist on my system.
A bit of background (you can either clone the repo or look at the Perl script attached), we're configuring the guest network interface as 'direct' with mode VEPA. IIUC, for proper VEPA support you need a compliant external switch which 1) I don't have 2) upstream CI planned to run in a nested env won't have either.
The main issue lies in the test trying to set <virtualport> parameters on the interface. I've tried with regular network interfaces, vlan-tagged interfaces (as one of the other error messages complained about a missing vlan tag - which is something VEPA switches supposedly do on their own), and SR-IOV VFs with no luck. I'd be happy for any networking insights here, but given the setup which had clearly been tested with specialized HW I'd suggest simply disabling the test from the suite for upstream purposes - well, the correct approach
I was involved in this a few years ago... without having the hardware myself. I am fine with you disabling the test case.
Interesting, can you elaborate on the details then how it's supposed to work? Laine mentioned that the code paths being exercised work only on top of SRIOV VFs, so I guess you made use of those at least?
The only other thing I'd be curious about is whether some changes were occurring in the past that broke this test case (suddenly). You may not know... I don't recall it having been broken, but cannot say for sure. Is there something sensing (now) that the right network setup is not available and therefore it behaves differently and things start failing?
It's been a looong time since the tests were run on a regular basis, so it's very much possible that the XML interface params results into a slightly different network setup in real world, it may not be just libvirt, but also the underlying stack, the question is whether it's really worth pursuing or rather try coming up with a new test case since this one hasn't been updated for about 8 years or so. Erik

On 2/5/20 11:00 AM, Erik Skultety wrote:
Hi list, so since the beginning of this week I've been poking at the last failure [1] in the nwfilter segment of the TCK suite. So, the errors come from libnl although I haven't been able to extract what the true underlying issue is since interface with ID '8' definitely exist on my system.
A bit of background (you can either clone the repo or look at the Perl script attached), we're configuring the guest network interface as 'direct' with mode VEPA. IIUC, for proper VEPA support you need a compliant external switch which 1) I don't have 2) upstream CI planned to run in a nested env won't have either.
The main issue lies in the test trying to set <virtualport> parameters on the interface. I've tried with regular network interfaces, vlan-tagged interfaces (as one of the other error messages complained about a missing vlan tag - which is something VEPA switches supposedly do on their own), and SR-IOV VFs with no luck.
I don't have the mental energy to trace through the code, but definitely 802.1Qbh only works on an SRIOV VF, and definitely the code is passing VF# all up and down the code stack for 802.1Qbg as well.
I'd be happy for any networking insights here, but given the setup which had clearly been tested with specialized HW I'd suggest simply disabling the test from the suite for upstream purposes
Yes, it should be disabled. That test already "self-skips" if lldptool isn't installed (right?), and I had always thought there was no reason to have that tool installed unless you had an 802.1Qbg-capable switch. So how is it that you're getting the test to fail rather than skip? Did you actually install the lldpad package? If so, what for? Is it used for something else? (I seriously doubt that anyone has ever run that test aside from maybe Gerhard Stenzel (the IBM person who added it) and possibly some IBM QE). If you need lldpad installed for other purposes, then I guess yeah, you should make some other config option to disable the test. (maybe it should have its own list of network interfaces separate from the list that's already in the config file)
- well, the correct approach would be to introduce a new config option indicating that specialized HW is necessary since currently the test case kind of abuses the config option assigning a virtual interface directly to the guest which in this case is a necessary condition, but not a sufficient one.
The existence/absence of lldptool (which is in the lldpad package, at least on Fedora) has been that option.
However, with the Avocado<->TCK joined work happening, I'd rather not spent more time with Perl than necessary.
[1] virNetDevVPortProfileOpSetLink:823 : error during virtual port configuration of ifindex 8: No such device or address virNetDevVPortProfileOpCommon:958 : internal error: sending of PortProfileRequest failed.
Thanks, Erik

On Wed, Feb 05, 2020 at 09:03:56PM -0500, Laine Stump wrote:
On 2/5/20 11:00 AM, Erik Skultety wrote:
Hi list, so since the beginning of this week I've been poking at the last failure [1] in the nwfilter segment of the TCK suite. So, the errors come from libnl although I haven't been able to extract what the true underlying issue is since interface with ID '8' definitely exist on my system.
A bit of background (you can either clone the repo or look at the Perl script attached), we're configuring the guest network interface as 'direct' with mode VEPA. IIUC, for proper VEPA support you need a compliant external switch which 1) I don't have 2) upstream CI planned to run in a nested env won't have either.
The main issue lies in the test trying to set <virtualport> parameters on the interface. I've tried with regular network interfaces, vlan-tagged interfaces (as one of the other error messages complained about a missing vlan tag - which is something VEPA switches supposedly do on their own), and SR-IOV VFs with no luck.
I don't have the mental energy to trace through the code, but definitely 802.1Qbh only works on an SRIOV VF, and definitely the code is passing VF# all up and down the code stack for 802.1Qbg as well.
I'd be happy for any networking insights here, but given the setup which had clearly been tested with specialized HW I'd suggest simply disabling the test from the suite for upstream purposes
Yes, it should be disabled.
That test already "self-skips" if lldptool isn't installed (right?), and I had always thought there was no reason to have that tool installed unless you had an 802.1Qbg-capable switch. So how is it that you're getting the test to fail rather than skip? Did you actually install the lldpad package?
Well, I guess I did install it after all since TCK builddep and git repo setup in a clean environment didn't pull the package in, which means that the test is already disabled properly like you said.
If so, what for? Is it used for something else? (I seriously doubt that anyone has ever run that test aside from maybe Gerhard Stenzel (the IBM
it's still puzzling me, WTH is going on in there, but poking around libnl hasn't been very productive so far and since the test *is* disabled by default I guess I don't need to care anymore, we can have it in, but as far as Avocado goes, once we start migrating the test cases to Python this one will be excluded for sure, so if anyone is going to be interested in it, they will have to fix it, verify it, port it and propose it upstream - alternatively, they can run it internally as long as the results are reported back to upstream.
person who added it) and possibly some IBM QE). If you need lldpad installed for other purposes, then I guess yeah, you should make some other config option to disable the test. (maybe it should have its own list of network interfaces separate from the list that's already in the config file)
- well, the correct approach would be to introduce a new config option indicating that specialized HW is necessary since currently the test case kind of abuses the config option assigning a virtual interface directly to the guest which in this case is a necessary condition, but not a sufficient one.
The existence/absence of lldptool (which is in the lldpad package, at least on Fedora) has been that option.
Yep, that is correct, I just can't remember why I installed it in the first place, probably because the test is marked as skipped and I was curious? Thanks, Erik
participants (3)
-
Erik Skultety
-
Laine Stump
-
Stefan Berger