On Tue, 2016-03-15 at 14:33 -0400, John Ferlan wrote:
On 03/14/2016 02:00 PM, Andrea Bolognani wrote:
>
> Word of warning, since this is very likely to get extremely
> verbose: I fully expect to slip, here and there, and refer not
> to the *current* situation as of this patch, but to the
> *desired* situation at the end of the series.
I've rebased to top of tree after your recent pushes... Seems like a
good place to reset focus! I think I need a virtual whiteboard.
So, I've already replied to your comments about patch 22 yesterday
and patch 21 earlier today.
I believe a nice chunk of your concerns have been addressed by my
comments in the latter, so I'll mostly go quickly over them. Of
course let me know if I've glossed over something that was
actually a separate concern.
After I'm done with this reply I'll post a shortened series that
includes the commits that still haven't been pushed, reordering
and fixing them as needed to avoid the problems you noticed in
patches 20 and 21.
(I might decide to call it a day and go home between these two
events, it's getting kind of late already :)
> Such *desired* situation is (not explicitly enough?) laid out
> in patch 19, but I'll expand on it here for clarity:
>
> * the active list and inactive list contain the actual devices
I understand/agree with the goal for actual devices in each list. One
mental block is usage of virPCIDeviceListAdd for activeList and
virPCIDeviceListAddCopy for inactiveList.
This mental block is one of the things patch 22 fixes :)
The second mental block is
whether these are for only managed devices (more on that later).
This should have been clarified by my comments to patch 21.
> * each device is contained either in the active list, or in
> the inactive list. It can't be in both
Sure that's where you're headed, but with the current code this isn't
necessarily true as a device could be on the inactiveList and activeList
during patch 5, but that's the goal of patch 22.
Not sure about the reference to patch 5, but yes, this is the
*desired* situation, ie. how the code is supposed to behave after
the whole series has been applied. If we were already in the
desired situation, then we would need no patches ;)
> * virPCIDevice instances in 'pcidevs' are only used
for
> looking up actual devices in the bookkeeping lists, contain
> no valuable data and are disposable at any time
This is where things start getting a bit fuzzy. I agree with the
concept; however, the current implementation has a gotcha. Consider how
VIR_APPEND_ELEMENT works. Initially 'pcidevs' is created as a new List,
then for each 'hostdev' device, a new virPCIDevicePtr is created and
added to the pcidevs list using virPCIDeviceListAdd. It's taken a bit to
have VIR_APPEND_ELEMENT processing sink in for me ;-), but as I read the
Prepare code it makes use of this processing model instead of creating a
separate copy for the activeList.
So just to be clear, this is again about using virPCIDeviceListAdd()
when adding devices to the active list and virPCIDeviceListAddCopy()
when adding them to the inactive list, right? So one of the very
issues patch 22 is supposed to clear up :)
> > First off - perhaps something for the previous
documentation patch -
> > PrepareDevices is called during the qemuProcessLaunch processing (eg
> > domain startup) for all known host devices. It's also called during
> > qemuDomainAttachHostPCIDevice or libxlDomainAttachHostPCIDevice
> > (hotplug) for the *one* new device to be attached.
> >
> > The purpose of the function is to take the passed hostdev list, move the
> > devices to a managed active host device list for the guest. Devices that
> > do not have the managed attribute are not processed.
> That's not true, though: unmanaged devices *are* processed, even
> if less operations are performed on them. The part about
> detaching them from the host is skipped, because it's supposed
> to have been performed by the user alread, but that's it;
> everything else (resetting them, moving them to the appropriate
> bookkeeping list, storing information about what driver and
> domain is using them) is just the same as managed devices.
>
I think this is where the confusion for me lies. I seem to have also
mistyped a bit - perhaps it's the blurred lines between current and
future perfect (hah!) state.
Initially an unmanaged device is not placed on the inactiveList (e.g.
step2 in the PreparePCIDevices code), but it is placed on the activeList
(step5).
Later on, we can place an unmanaged device on the inactiveList either in
virHostdevReattachPCIDevice using the "stolen" 'pcidevs' entry during
virHostdevReAttachPCIDevices or virHostdevPCINodeDeviceDetach where
virPCIDeviceDetach uses a copy of the device. Although patch22 removes
the addition into the inactiveList if !managed (new step5 and deletion
of lines in ReattachPCIDevice). Still leaves the NodeDeviceDetach path
to understand from whence that request came.
This should have been addressed by the comments on patch 21.
> > > > /* Step 8: Now set the original states for
hostdev def */
> > > > for (i = 0; i < nhostdevs; i++) {
> > > > - virPCIDevicePtr pci;
> > > > + virPCIDevicePtr actual;
> > > > virDomainHostdevDefPtr hostdev = hostdevs[i];
> > > > virDomainHostdevSubsysPCIPtr pcisrc =
&hostdev->source.subsys.u.pci;
> > > >
> > > > @@ -659,24 +659,27 @@
virHostdevPreparePCIDevices(virHostdevManagerPtr mgr,
> > > > if (hostdev->source.subsys.type !=
VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI)
> > > > continue;
> > > >
> > > > - pci = virPCIDeviceListFindByIDs(pcidevs,
> > > > - pcisrc->addr.domain,
> > > > - pcisrc->addr.bus,
> > > > - pcisrc->addr.slot,
> > > > - pcisrc->addr.function);
> > > > + /* We need to look up the actual device because it's the
one
> > > > + * that contains the information we care about
(unbind_from_stub,
> > > > + * remove_slot, reprobe) */
> > > When a device goes from the inactivePCIHostdevs list to the
> > > activePCIHostdevs list "at some point in time" in the future -
does it
> > > do a similar save?
> >
> > I'll answer my own question - because this is the Prepare function and
> > we don't have to worry about it since everything is on the active list.
> Exactly, all devices that go from the inactive list to the active
> list do so via this function... We don't need to handle this
> anywhere else. And by the time this specific lookup is performed,
> all devices have already been moved to the active list (step 5).
>
Right, but if I read patch22 correctly (jumping slightly ahead), only
managed devices would be placed onto the activeList. Currently all
pcidevs are moved to activeList, but your change there moves from
inactiveList to activeList, where AFAICT the inactiveList only has
managed devices.
See my conundrum now? The good news is the fridge is stocked with cold
ones to help me ;-)
Again, comments on patch 21. And for some reason I kinda feel
thirsty now, weird ;)
> > Before patch 15, step 3 would take any of the devices we
removed from
> > the activeList and perform a reset on them.
> >
> > After patch 15, step 3 would do a similar function assuming that no
> > device could not be on neither list.
> NO device could NOT be on NEITHER list... My head is spinning :)
>
> Actually, when we get here the devices are *guaranteed* to be in
> neither list, because we removed them from the active list in
> step 1. But that was the case even before patch 15, so it should
> not be a problem.
>
> >
> > Hopefully this all makes sense - I keep reading it and going back and
> > fort between old and new code. The comment left in the code after step
> > 1 is most helpful...
> I know how you feel! Thanks for sticking with it, hopefully the
> comments I've provided so far (especially the ones at the top of
> this message) are helpful in understanding the thought process
> behind the changes.
>
> I'll get back to you with more comments tomorrow - bet you can't
> wait for them! ;)
No good deed goes unpunished ;-):
As I read the current PreparePCIDevices code:
Step1: Walk pcidevs, make some VFIO/ACS check... Make some PCI Node
Device check... assume everything works as documented ;-)
Step2: Walk pcidevs, if managed, call virPCIDeviceDetach to place *copy*
of pci onto the inactive list, else VIR_DEBUG message
Step3: Walk pcidevs, call virPCIDeviceReset on all devices (both those
managed in inactiveList and those that are not managed)
Step4: Walk hostdevs, SRIOV specific to replace network config (close my
eyes, pray this works)
Step5: Walk pcidevs, *adding* each device to activeList,. Since not
copy, then device is on activeList *and* pcidevs at the same time,
right? Step9 will "steal" from pcidevs to ensure virObjectUnref of
pcidevs doesn't have virPCIDeviceListDispose free what's in activeList.
(This is a key point...)
Step6: Walk pcidevs, remove pci from inactiveList (free all memory of
the copy initially placed on inactiveList).
Step7: Walk pcidevs, if device found on activeList, call SetUsedBy
Step8: Walk hostdevs, if device on pcidevs, then set unbind_from_stub,
remove_slot, and reprobe (since everything on pcidevs is also in the
activeList - magically the activeList data gets updated).
Step9: Walk pcidevs, Stealing each element off of pcidevs (but doesn't
virPCIDeviceFree the element so that virObjectUnref(pcidevs) won't find
anything and do the free - because that element is on the activeList!).
My notes (or what keeps bouncing around while working through this):
1. During the Prepare step, activeList add is not a copy, it uses
VIR_APPEND_ELEMENT. IIUC, this means if 'dev' was dereferenced in
virPCIDeviceListAdd after the APPEND, then there'd be a problem since
dev would be set to NULL after the macro; however, since 'dev' is passed
by value, that means the caller (e.g. the Prepare code) still can
reference the element - it's just that the virPCIDevicePtr is listed in
*two places*.
Exactly, and doing so causes nothing but grief - which is why
patch 22 gets rid of this behaviour completely.
You'll note other places in the code will Steal from one
list and place on another, but this code doesn't do that until step9
where it steals all the pcidevs entries into apparently vaporware.
"Theoretically" we really shouldn't reference activeList elements in
pcidevs, but if we do reference them, then "magically" (so to speak) the
activeList element would be updated. Personally it may be easier to use
a *copy* for the activeList too and then let virObjectUnref handle the
pcidevs entry deletion.
Claiming that I completely agree is an understatement!
2. During Prepare, inactiveList add is done by copy (although one
could
argue that virPCIDeviceListAddCopy could be replaced by
VIR_APPEND_ELEMENT_COPY I suppose). So now there are *two* copies of the
same device - this is good and again, I think the model the activeList
should use.
Yes!
3. Current step9 just "steals" (or VIR_DELETE_ELEMENT) the
element from
pcidevs. This is "fine" since it's already/also on the activeList.
You'll note other users of the macro would unref/free whatever memory
was contained in the structure prior to removing from the list. Good
thing we don't do that here ;-).
As noted previously, the current code is correct, just *extremely*
hard to get right! So kudos to whoever managed such a feat :)
If we don't steal the entry, then
virObjectUnref(pcidevs) will call virPCIDeviceListDispose for us - your
patch22 change to remove this may not be right...
I believe I've already explained this in detail in my reply to your
comments on patch 22, so I won't repeat myself here. If you still
think this is unclear, just let me know.
So given the current state of things, here's my view on the
changes
assuming my VIR_APPEND_ELEMENT viewpoint:
Change #1: (what';s in the commit message)
Alters step8 to use activeList instead of pcidevs. This part looks
correct (perhaps worthy of it's own ACKable patch).
Change #2: (not in the commit message)
You jump into virHostdevReAttachPCIDevices and I then have to focus on a
different algorithm and step process. Here's where I definitely believe
you had patch22 in mind.
Yeah, I've already confirmed that was the case :)
But focusing on the code as it exists now... This code walks the
hostdev's and will find anything on the pcidevs list which was on the
former activeList and ensures the NetConfigRestore is run (e.g. the
corollary of step4 in Prepare).
Your change to ensure the device is on the inactiveList doesn't seem
right at this point in time of the code. When you insert step2 in
patch22, it would perhaps be more correct...
Yup.
Without peeking ahead too much, I think the insert unmanaged device
onto
inactiveList done in virHostdevReattachPCIDevice is incorrect at this
point. Even if not incorrect, it's confusing.
No, you're right, it's actually incorrect.
Thinking in terms of current code - step4 will take the everything
that
was formerly on the active list and:
1. Steal the pcidevs entry
2. Call virHostdevReattachPCIDevice with the stolen lookaside pcidevs
entry... While it appears to be an actual, it isn't. Of course, I think
you address that in patch 22, but trying not to get too far ahead.
2a. If the device is not managed, then add to the inactiveList and
return (call virPCIDeviceFree if unsuccessfully add). What really
confuses me here is why we place onto the inactiveList if not managed;
however, you address that in patch22. Still it could be addressed
earlier if that is a separate bug...
I would have to go back to patch 22 to make sure this is really
handled correctly there: I'm pretty confident it is, but I've
been wrong before! Most recently, while writing this series :(
The incorrect behaviour you've noticed before has actually been
introduced by patch 15. Before that, 'pcidevs' in
virHostdevReAttachPCIDevices() was obtained by copying instances
off the active list, hence virHostdevReattachPCIDevice() would
really be passed (a copy of) an actual; now that's no longer the
case, and the code has become incorrect.
Will fix as part of the respin.
2b. If the device is managed, call virPCIDeviceReattach
2b1. Fail if on activeList (miserable)
2b2. Unbind stub
2b3. Call virPCIDeviceListDel
-> Steal from inactiveList (search inactiveList for matching
device by domain address, bus slot of the passed lookside pci device)
-> call virPCIDeviceFree on the inactiveList entry
2c. Call virPCIDeviceFree on the lookaside list entry.
3. virObjectUnref(pcidevs)
-> If I read the code right - each entry was stolen and properly
removed so the virPCIDeviceListDispose has nothing to do.
So it seems at this point, I would surmise virPCIDeviceReattach expects
the pcidevs lookaside entry.
Nope, definitely wants the actual device.
The problem being insertion onto the
inactiveList of an unmanaged device (something we avoided in Prepare).
Well, unmanaged devices are supposed to end up in the inactive
list at the end of this function... They will be removed from
there by virHostdevPCINodeDeviceReAttach().
And yes, I'm still curious about virHostdevPCINodeDeviceDetach
and how
that plays into things.
What exactly is confusing you about that function?
So well, it's almost dinner time here, respin's coming tomorrow
I guess :)
Cheers.
--
Andrea Bolognani
Software Engineer - Virtualization Team