On Tue, 2016-01-26 at 18:55 -0500, John Ferlan wrote:
w/r/t: your [0/7] from initial series...
As much as you don't want to keep living Groundhog Day - resolution of
bugs like this are job security :-)...
Groundhog Day is in less than a week, by the way! :)
w/r/t Suggestions for deamon restart issues... Seems like we need a
way
to save/restore the hostdev_mgr active/inactive lists using XML/JSON
similar to how domains, storage, etc. handle it. Guess I just assumed
that was already there ;-) since /var/run/libvirt/hostdevmgr exists. It
seems that network stuff can be restored - virHostdevNetConfigRestore.
Do you really think this series should be "held up" waiting to create
some sort of status tracking?
I will look into your suggestion. I believe such save / restore
functionality has to be in place by the time this series is merged if
we don't want to break everything on daemon restart.
On 01/25/2016 11:20 AM, Andrea Bolognani wrote:
> This function replaces virHostdevReattachPCIDevice() and, unlike it,
> does not perform list manipulation, leaving it to the calling function.
>
> This means virHostdevReAttachPCIDevices() had to be updated to cope
> with the updated requirements.
> ---
> src/util/virhostdev.c | 136 +++++++++++++++++++++++++++++++++-----------------
> 1 file changed, 90 insertions(+), 46 deletions(-)
Since I reviewed them all... I think the comment changes from 7/9 should
just be inlined here and patch 4 instead of a separate patch
Will do - it was that way in v1 as well.
> diff --git a/src/util/virhostdev.c b/src/util/virhostdev.c
> index f31ad41..66629b4 100644
> --- a/src/util/virhostdev.c
> +++ b/src/util/virhostdev.c
> @@ -526,6 +526,74 @@ virHostdevNetConfigRestore(virDomainHostdevDefPtr hostdev,
> return ret;
> }
>
> +/**
> + * virHostdevOnlyReattachPCIDevice:
Why not just reuse the function name you deleted? IOW: Is there a reason
for "Only"? (not that I'm one that can complain about naming functions,
but this just seems strange).
It's an attempt to make it stand out a bit from
virHostdevPCINodeDeviceReAttach()
virHostdevReAttachPCIDevices()
in the same file. Mostly the latter.
The reasoning behind "Only" is that the function performs "Only" the
job
of reattaching the device to the host, with the error checking,
bookkeeping and additional steps left to the caller.
Which is, strictly speaking, not true :)
Maybe something like virHostdevReattachPCIDeviceCommon(), to express the
fact that this basically contains as much functionality as it was
possible to split off to a reusable routine?
> + * @mgr: hostdev manager
> + * @pci: PCI device to be reattached
Interesting ... In 2 instances, this will be a pointer to the "copy"
element, while in the third instance this will be the "actual" on
inactive list element. For a copy element, we'd *have* to search
inactive; however, for an 'actual' we don't "need" to.
Good point.
I will try to find a solution that
1. avoids searching the list twice
2. avoids duplicating code
3. respects the Principle of Least Surprise
I can't guarantee I'll be able to :)
> + * @skipUnmanaged: whether to skip unmanaged devices
> + *
> + * Reattach a PCI device to the host.
> + *
> + * This function only performs the base reattach steps that are required
> + * regardless of whether the device is being detached from a domain or
> + * had been simply detached from the host earlier.
> + *
> + * @pci must have already been marked as inactive, and the PCI related
> + * parts of @mgr (inactivePCIHostdevs, activePCIHostdevs) must have been
> + * locked beforehand using virObjectLock().
> + *
> + * Returns: 0 on success, <0 on failure
> + */
> +static int
> +virHostdevOnlyReattachPCIDevice(virHostdevManagerPtr mgr,
> + virPCIDevicePtr pci,
> + bool skipUnmanaged)
> +{
> + virPCIDevicePtr actual;
> + int ret = -1;
> +
> + /* Retrieve the actual device from the inactive list */
> + if (!(actual = virPCIDeviceListFind(mgr->inactivePCIHostdevs, pci))) {
> + VIR_DEBUG("PCI device %s is not marked as inactive",
> + virPCIDeviceGetName(pci));
This is tricky - the only time we care about the return status (now) is
when skipUnmanaged == false, a/k/a the path where we pass the onlist
element..
In my first pass through the changes I thought - oh no if we return -1,
then this would be a path that could get that generic libvirt failed for
some reason message; however, closer inspection noted that we guaranteed
it was on the inactive list before calling here.
So we should be good, right? :)
> + goto out;
> + }
> +
> + /* Skip unmanaged devices if asked to do so */
> + if (!virPCIDeviceGetManaged(actual) && skipUnmanaged) {
<sigh>, unrelated and existing - virPCIDeviceGetManaged probably should
return bool instead of unsigned int
Yup, good catch. The same applies to
virPCIDeviceGetUnbindFromStub()
virPCIDeviceGetRemoveSlot()
virPCIDeviceGetReprobe()
as well. I'll fix them in a separate commit.
> + VIR_DEBUG("Not reattaching unmanaged PCI device
%s",
> + virPCIDeviceGetName(actual));
> + ret = 0;
> + goto out;
> + }
> +
> + /* Wait for device cleanup if it is qemu/kvm */
> + if (virPCIDeviceGetStubDriver(actual) == VIR_PCI_STUB_DRIVER_KVM) {
> + int retries = 100;
> + while (virPCIDeviceWaitForCleanup(actual, "kvm_assigned_device")
> + && retries) {
> + usleep(100*1000);
> + retries--;
> + }
> + }
Existing, but if retries == 0, then cleanup never succeeded; however,
perhaps one can assume that the subsequent call would fall over and play
dead? Not that it really gets checked...
I recall raising the issue at some point in the past, but I don't
remember the outcome of that discussion... Maybe this can be assessed
again at a later time?
> +
> + VIR_DEBUG("Reattaching PCI device %s", virPCIDeviceGetName(actual));
> + if (virPCIDeviceReattach(actual, mgr->activePCIHostdevs,
> + mgr->inactivePCIHostdevs) < 0) {
> + virErrorPtr err = virGetLastError();
> + VIR_ERROR(_("Failed to reattach PCI device %s: %s"),
> + virPCIDeviceGetName(actual),
> + err ? err->message : _("unknown error"));
> + virResetError(err);
> + goto out;
> + }
> +
> + ret = 0;
> +
> + out:
> + return ret;
> +}
> +
> int
> virHostdevPreparePCIDevices(virHostdevManagerPtr hostdev_mgr,
> const char *drv_name,
> @@ -753,45 +821,6 @@ virHostdevPreparePCIDevices(virHostdevManagerPtr hostdev_mgr,
> return ret;
> }
>
> -/*
> - * Pre-condition: inactivePCIHostdevs & activePCIHostdevs
> - * are locked
> - */
> -static void
> -virHostdevReattachPCIDevice(virPCIDevicePtr dev, virHostdevManagerPtr mgr)
> -{
> - /* If the device is not managed and was attached to guest
> - * successfully, it must have been inactive.
> - */
> - if (!virPCIDeviceGetManaged(dev)) {
> - VIR_DEBUG("Adding unmanaged PCI device %s to inactive list",
> - virPCIDeviceGetName(dev));
> - if (virPCIDeviceListAdd(mgr->inactivePCIHostdevs, dev) < 0)
> - virPCIDeviceFree(dev);
> - return;
> - }
> -
> - /* Wait for device cleanup if it is qemu/kvm */
> - if (virPCIDeviceGetStubDriver(dev) == VIR_PCI_STUB_DRIVER_KVM) {
> - int retries = 100;
> - while (virPCIDeviceWaitForCleanup(dev, "kvm_assigned_device")
> - && retries) {
> - usleep(100*1000);
> - retries--;
> - }
> - }
> -
> - VIR_DEBUG("Reattaching PCI device %s", virPCIDeviceGetName(dev));
> - if (virPCIDeviceReattach(dev, mgr->activePCIHostdevs,
> - mgr->inactivePCIHostdevs) < 0) {
> - virErrorPtr err = virGetLastError();
> - VIR_ERROR(_("Failed to re-attach PCI device: %s"),
> - err ? err->message : _("unknown error"));
> - virResetError(err);
> - }
> - virPCIDeviceFree(dev);
> -}
> -
> /* @oldStateDir:
> * For upgrade purpose: see virHostdevNetConfigRestore
> */
> @@ -803,7 +832,7 @@ virHostdevReAttachPCIDevices(virHostdevManagerPtr hostdev_mgr,
> int nhostdevs,
> const char *oldStateDir)
> {
> - virPCIDeviceListPtr pcidevs;
> + virPCIDeviceListPtr pcidevs = NULL;
> size_t i;
>
> if (!nhostdevs)
> @@ -848,11 +877,25 @@ virHostdevReAttachPCIDevices(virHostdevManagerPtr
hostdev_mgr,
> continue;
> }
> }
> + i++;
> + }
Curious why the decision for a second loop - if activeDev matches, then
it almost seems a shame to loop again. Why not (within if (activeDev):
actual = virPCIDeviceListSteal(hostdev_mgr->activePCIHostdevs,
activeDev);
/* !actual should never happen, but better safe than sorry */
if (actual &&
virPCIDeviceListAdd(hostdev_mgr->inactivePCIHostdevs,
actual) < 0)
virPCIDeviceFree(actual);
/* You could also... */
virPCIDeviceListDel(pcidevs, dev);
}
Mostly because I consider moving the devices from one list to another
as a separate step.
We could merge the two steps, and that would bring us down to 4 (or 5
if you count the one implicit in virHostdevGetActivePCIHostDeviceList())
loops, but I'm not sure whether that would be a significant gain in
performance or whether it would just make the code a little more
convoluted...
NOTE: Whether there is one or two loops, the second loop would need
to
call virPCIDeviceFree(actual) since we'd leak it otherwise.
You mean on error? Because otherwise I don't see the leak: the actual
device is stolen from the active list and added (itself, not a copy)
to the inactive list.
You'll also note I didn't keep the goto cleanup. Previously
the code
would completely go through the pcidevs Loop 4 regardless of whether
virHostdevReattachPCIDevice code had failures. The new code has the
subtle difference of jumping to cleanup if a failure is found. That
could leave things in an awkward state especially since
virHostdevReAttachPCIDevices is a void.
Since failure for DeviceListAdd is because 1. device is already there
(which I would *hope* isn't the case) or 2. memory allocation failure
(in which case not much else successful will probably happen anyway),
then perhaps continuing on rather than jumping to cleanup isn't a bad
idea... We could be returning some memory that someone may find useful.
My concerns about jumping to cleanup are that this API is called in
error recovery paths as well as part of the ominous comment "For upgrade
purpose:..." (comment before function header). So it seems the "existing
logic" is try to clean up as many as possible. By potentially erroring
out too soon could lead to more problems.
So the question becomes what havoc is introduced if we cannot add to the
inactive list but decide to continue as before... It seems we'll end up
"failing" in virHostdevOnlyReattachPCIDevice since it's not in the
inactiveList, but our Loop 4 logic doesn't care. Of course we could
delete 'dev' from 'pcidevs' too before then...
Hopefully this makes sense... It's been an 'edit in process'...
See the comment at the end of the message.
> +
> + /* Step 2: move all devices from the active list to the inactive list */
> + for (i = 0; i < virPCIDeviceListCount(pcidevs); i++) {
> + virPCIDevicePtr dev = virPCIDeviceListGet(pcidevs, i);
> + virPCIDevicePtr actual;
>
> VIR_DEBUG("Removing PCI device %s from active list",
> virPCIDeviceGetName(dev));
> - virPCIDeviceListDel(hostdev_mgr->activePCIHostdevs, dev);
This was a curious placement for *ListDel... If !activeDev, then calling
*ListDel also won't find 'dev' on activelist...
If 'activeDev != NULL', then driver name and domain name are checked,
which may cause the 'dev' to be removed from 'pcidev' and the loop to
restart.
If that does not happen 'dev' is removed from the active list, even
thought it might not have been in that list in the first place. But
the code is doing the right thing in all situations.
> - i++;
> + if (!(actual = virPCIDeviceListSteal(hostdev_mgr->activePCIHostdevs,
> + dev)))
> + goto cleanup;
If the choice is to use two loops (and perhaps keep the cleanups)...
1. If this Steal fails, then something is seriously wrong, but we don't
even give it a VIR_DEBUG message.
2. If this Steal fails, then going to cleanup is again a subtle
difference with the prior logic that said, well I couldn't do anything
with this, but I'm going to keep processing.
3. If we keep processing, then something on 'pcidevs' won't be in
'inactivePCIHostdevs', causing virHostdevOnlyReattachPCIDevice to fail.
But that does not matter since we ignore the return value in Loop 4.
4. If we do Steal and if the subsequent Add fails, then we leak
'actual', so prior to the goto cleanup call virPCIDeviceFree(actual);
(or instead if the goto cleanup;'s are removed).
Thanks for looking into this in such detail. I will go through the
existing code again myself and either become confident that the code
is doing the right thing or change it so that it does :)
On the other hand, there's this patch I'm working on that changes the
way bookkeeping is performed quite substantially... My idea was to
propose it as a follow-up to this series, since it basically replaces
some constructs with some other "equivalent" constructs without altering
the overall control flow, but maybe at this point it could be worth it
to merge everything together and hopefully avoid such pitfalls, and make
the whole thing easier to reason about.
Cheers.
--
Andrea Bolognani
Software Engineer - Virtualization Team