
On 3/23/21 5:37 AM, Daniel P. Berrangé wrote:
On Mon, Mar 22, 2021 at 05:30:34PM -0400, Laine Stump wrote:
Commit 114e3b42 modified virDomainNetFindIdx() to compare the alias name of the <interface> being matched (in addition to already-existing match of MAC address and PCI/CCW address). This was done in response to https://bugzilla.redhat.com/1926190 which complained that it wasn't possible to unplug an interface that had the same MAC address as another <interface> (which is the way interfaces are setup when using <teaming> - the "team" is identified in the guest virtio-net driver by looking for another interface with the same MAC as the virtio-net device) - the reporter of that bug did not have PCI addresses of the devices easily available when attempting to unplug one of the two devices, and so needed some other way to disambiguate the two interfaces with matching MACs.
Unfortunately, this caused a regression which was caught during QE testing - in the past it had been possible to use virDomainUpdateDevice (which also calls virDomainNetFindIdx()) to modify the alias name of an existing interface - with the change in commit 114e3b42 this was no longer possible (since we would try to match the new alias, which would of course always fail).
The solution to this regression is to allow mismatched alias names when calling virDomainNetFindIdx() for purposes of updating a device (indicated by the new bool argument "partialMatch"). When calling for unplug the old behavior is maintained - in that case the alias name must still match.
I find this seriously dubious. Essentially this is saying that alias is the unique identifier used to match, except when it isn't used to match. Renaming the unique identifier is a questionable action, and I'd claim that fact that it worked previously is an accident rather than by design.
Okay, I'm fine with that; prefer it, even (truthfully as soon as you said that, I wondered why I didn't push back on the regression claim - I guess there's just so much "accidental behavior" that we've ended up codifying (a couple that come to mind are the tangentially related "user specified alias names are ignored *unless they being with "ua-", and "tap device names beginning with "vnet" are ignored), that I've gotten used to quiet compliance unless it's something I have a strong opinion about...). Now I just need to figure out how to sort out the BZ status. Anyway, the other thing to come out of this is trying out GArray - my verdict is that 1) I like that the length is part of the GArray rather than our practice of using a separate variable, and 2) being required to use a big long macro where you must even specify the type of the element in order to simply access the element at a specific index is ugly and cumbersome. I mean, really: matches[i] vs g_array_index(matches, size_t, i) ?? Couldn't they at least have written the macro to use typeof(*matches)?
Because we need to keep track of potentially multiple "partial" matches so that we can go back later and try to disambiguate when necessary, I needed an array to hold the indexes of all the matches during the "first round". There is guidance in glib-adoption.rst saying that new libvirt code should prefer GArray/GPtrArray. A couple of adventurous souls have used GPtrArray, but so far nobody has used GArray, and I decided this was a good chance to try that out. It went okay.
Reported-by: Yalan Zhang <yalzhang@redhat.com> Signed-off-by: Laine Stump <laine@redhat.com> ---
I realized while writing this patch that an update-device test case in qemuhotplugtest would have caught this regression and so I probably should add such a test case to prevent it happening again, but the testing harness for update-device was only ever made to work for graphics devices, meaning there's some unknown amount of investigating and rejiggering that needs to be done to make such a test work (a quick attempt failed). Since someone is waiting on the fix for this regression, I'm hoping that I can get a reprieve on the "add a test case to catch thre regression" part that should accompany a bugfix like this in exchange for a promise that I will start looking into that immediately after I get this pushed (and then backported so testing of the bugzilla above can be completed)
src/conf/domain_conf.c | 207 +++++++++++++++++++++++++-------------- src/conf/domain_conf.h | 2 +- src/libxl/libxl_driver.c | 4 +- src/lxc/lxc_driver.c | 6 +- src/qemu/qemu_driver.c | 6 +- src/qemu/qemu_hotplug.c | 4 +- 6 files changed, 145 insertions(+), 84 deletions(-)
Regards, Daniel