[libvirt] [PATCH] virhostdev: Fix PCI devices are still attatched to stub driver bug

newer
[libvirt] what determines the type...

Wu Zongyong

30 Aug 2018 30 Aug '18

11:34 p.m.

Currently, PCI devices will not be rebound to host drivers but attached to the stub driver when: 1) use libvirt to start a virtual machine with PCI devices assigned, then stop libvirtd process and shutdown the virtual machine. Finally, PCI devices are still bound to the stub driver instead of host drivers after libvirt start again. 2) use libvirt to shutdown a virtual machine wtih PCI devices assigned, then stop libvirtd process before libvirt try to rebind PCI devices to host drivers. Finally, PCI devices are still bound to the stub driver after libvirt start again. Notice that the comment on the top of virPCIDeviceDetach as follows: activeDevs should be a list of all PCI devices currently in use by a domain.inactiveDevs is a list of all PCI devices that libvirt has detached from the host driver + attached to the stub driver, but hasn't yet assigned to a domain. It's not reasonable that libvirt filter out devices that are either not active or not used by the current domain and driver. For devices belong to domains that has been shutdown before libvirt start, we should put them into inactiveDevs if then meet the condition that PCI devices have been detached from the host driver + attached to the stub driver. Moreover, we set orignal states of PCI devices when build PCI devices list in virHostdevGetPCIHostDeviceList if states has been set in struct virDomainHostdevDefPtr. Signed-off-by: Wu Zongyong <cordius.wu@huawei.com> --- src/util/virhostdev.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/src/util/virhostdev.c b/src/util/virhostdev.c index ca79c37..ecf95e3 100644 --- a/src/util/virhostdev.c +++ b/src/util/virhostdev.c @@ -235,6 +235,7 @@ virHostdevGetPCIHostDeviceList(virDomainHostdevDefPtr *hostdevs, int nhostdevs) for (i = 0; i < nhostdevs; i++) { virDomainHostdevDefPtr hostdev = hostdevs[i]; virDomainHostdevSubsysPCIPtr pcisrc = &hostdev->source.subsys.u.pci; + virDomainHostdevOrigStatesPtr origstates = &hostdev->origstates; virPCIDevicePtr pci; if (hostdev->mode != VIR_DOMAIN_HOSTDEV_MODE_SUBSYS) @@ -262,6 +263,10 @@ virHostdevGetPCIHostDeviceList(virDomainHostdevDefPtr *hostdevs, int nhostdevs) virPCIDeviceSetStubDriver(pci, VIR_PCI_STUB_DRIVER_XEN); else virPCIDeviceSetStubDriver(pci, VIR_PCI_STUB_DRIVER_KVM); + + virPCIDeviceSetUnbindFromStub(pci, origstates->states.pci.unbind_from_stub); + virPCIDeviceSetRemoveSlot(pci, origstates->states.pci.remove_slot); + virPCIDeviceSetReprobe(pci, origstates->states.pci.reprobe); } return pcidevs; @@ -1008,8 +1013,19 @@ virHostdevReAttachPCIDevices(virHostdevManagerPtr mgr, continue; } } else { - virPCIDeviceListDel(pcidevs, pci); - continue; + int stub = virPCIDeviceGetStubDriver(pci); + if (stub > VIR_PCI_STUB_DRIVER_NONE && + stub < VIR_PCI_STUB_DRIVER_LAST) { + /* The device is bound to a known stub driver: add a copy + * to the inactive list */ + VIR_DEBUG("Adding PCI device %s to inactive list", + virPCIDeviceGetName(pci)); + if (virPCIDeviceListAddCopy(mgr->inactivePCIHostdevs, pci) < 0) { + VIR_ERROR(_("Failed to add PCI device %s to the inactive list"), + virGetLastErrorMessage()); + virResetLastError(); + } + } } i++; -- 1.9.1

Show replies by date

John Ferlan

6 Sep 6 Sep

2:24 p.m.

$SUBJ: s/attatched/attached s/bug// On 08/31/2018 03:34 AM, Wu Zongyong wrote: So, first off - I believe there are two things going on in this one patch. Even though there is "some relationship" between the issues, the libvirtd restart is kind of a corner case, while the change to readd the device to inactiveDev's would affect both this libvirtd restart case *and* the normal processing. It's taken me some time to come to that conclusion - lots of details follow, but I'm also willing to accept the changes are "related enough" for just one patch.

...

Currently, PCI devices will not be rebound to host drivers but attached to the stub driver when:

1) use libvirt to start a virtual machine with PCI devices assigned, then stop libvirtd process and shutdown the virtual machine. Finally, PCI devices are still bound to the stub driver instead of host drivers after libvirt start again. 2) use libvirt to shutdown a virtual machine wtih PCI devices assigned,

s/wiht/with

...

then stop libvirtd process before libvirt try to rebind PCI devices to

Be specific which API isn't being called - it really will help. Making a reviewer figure it out in very specific/particular cases such as this leads to "review delay". Is qemuHostdevReAttachDomainDevices called from qemuProcessStop what you were referring to? The Stop not getting called because the monitor channel is closed and thus the event for shutdown won't be triggered.

...

host drivers. Finally, PCI devices are still bound to the stub driver after libvirt start again.

Since you've done the research and so that I'm sure I'm following your logic - it seems like you're chasing corner cases based on libvirtd restart. In one instance the order is: virsh start $dom service libvirtd stop (or some other means?) <shutdown> the guest from within the $dom and the other instance is virsh start @dom virsh destroy $dom service libvirtd stop and issues as a result of the subsequent "service libvirtd start". In the second instance it's not clear how your timing is "just right" such that libvirtd can be stopped before the PCI devices can be returned to the host PCI, but it's essentially the same as the first scenario at least with respect to the subsequent libvirtd restart and discovery that the guest is no longer running and needing to process that. What may also have helped is if the commit message indicated the "normal" path taken that would allow the drivers to be rebound properly. Even if it's after the "---" for review backup assistance information. In edge cases like this, it's far easier to cut out stuff from a commit message than it is to put yourself into the frame of reference of the patch submitter Still what your patch does is use the virDomainHostdevOrigStatesPtr for a domain that was shutdown, but when libvirtd restarts. I had to go a long way back to find when the OrigStates code was added - commit d84b36263 in particular. In that case, the order was start domain, restart libvirtd, and destroy domain. So the slight difference to the condition for this patch is the domain was shutdown while libvirtd was stopped. In any case, all of the above is "one issue"... Leaving the rest as a "second issue"

...

Notice that the comment on the top of virPCIDeviceDetach as follows:

activeDevs should be a list of all PCI devices currently in use by a domain.inactiveDevs is a list of all PCI devices that libvirt has

* activeDevs is a list ... * inactiveDevs is a list ...

...

detached from the host driver + attached to the stub driver, but hasn't yet assigned to a domain.

It's not reasonable that libvirt filter out devices that are either not active or not used by the current domain and driver. For devices belong

s/belong/that belong/

...

to domains that has been shutdown before libvirt start, we should put

s/has/have s/libvirt start, we should put them/libvirtd starts, they should be placed

...

them into inactiveDevs if then meet the condition that PCI devices have

I'm finding it hard to read/parse/rewrite the rest of this sentence.

...

been detached from the host driver + attached to the stub driver.

Moreover, we set orignal states of PCI devices when build PCI devices

s/we set orignal/set the original/ s/build/building/

...

list in virHostdevGetPCIHostDeviceList if states has been set in struct virDomainHostdevDefPtr.

So the "second issue" (as I see it) is that ReAttach processing wasn't handling returning devices that had stub driver attachment (at some point in time). This issue occurs regardless of the libvirtd restart and domain shutdown processing order. Even though it's related, it's different and would need its own patch.

...

Signed-off-by: Wu Zongyong <cordius.wu@huawei.com> --- src/util/virhostdev.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/src/util/virhostdev.c b/src/util/virhostdev.c index ca79c37..ecf95e3 100644 --- a/src/util/virhostdev.c +++ b/src/util/virhostdev.c @@ -235,6 +235,7 @@ virHostdevGetPCIHostDeviceList(virDomainHostdevDefPtr *hostdevs, int nhostdevs) for (i = 0; i < nhostdevs; i++) { virDomainHostdevDefPtr hostdev = hostdevs[i]; virDomainHostdevSubsysPCIPtr pcisrc = &hostdev->source.subsys.u.pci; + virDomainHostdevOrigStatesPtr origstates = &hostdev->origstates; virPCIDevicePtr pci;

if (hostdev->mode != VIR_DOMAIN_HOSTDEV_MODE_SUBSYS) @@ -262,6 +263,10 @@ virHostdevGetPCIHostDeviceList(virDomainHostdevDefPtr *hostdevs, int nhostdevs) virPCIDeviceSetStubDriver(pci, VIR_PCI_STUB_DRIVER_XEN); else virPCIDeviceSetStubDriver(pci, VIR_PCI_STUB_DRIVER_KVM); + + virPCIDeviceSetUnbindFromStub(pci, origstates->states.pci.unbind_from_stub); + virPCIDeviceSetRemoveSlot(pci, origstates->states.pci.remove_slot); + virPCIDeviceSetReprobe(pci, origstates->states.pci.reprobe);

This looks a lot like what virHostdevUpdateActivePCIDevices is doing - in fact I think that is the libvirtd restart "bug" you're chasing - qemuProcessReconnect will call qemuHostdevUpdateActiveDomainDevices after the call to qemuConnectMonitor - at least w/r/t this PCI stub interaction. Since the domain was stopped connecting to the monitor will fail (in my case qemuMonitorOpenUnix), thus falling into qemuProcessStop, but without having updated (or ReAttach'ing) the PCI devices. So, in your environment without these two changes - could you check to see what happens if the order is changed? Does that fix the issue? That is move the call to qemuHostdevUpdateActiveDomainDevices to right after qemuProcessPrepareAllowReboot. Some more "detailed" thoughts I had on this while investigating... Both virHostdevPreparePCIDevices and virHostdevReAttachPCIDevices will call virHostdevGetPCIHostDeviceList. However, in the case of the former, "Step 7" is "set the original states for hostdev def". Following call traces for virHostdevPreparePCIDevices eventually leads back to : 1. qemuMigrationDstPrepareAny (migration) 2. qemuProcessStart (guest startup) 3. qemuDomainAttachHostDevice (hotplug). IOW: This is the "normal" running mode. So I have a concern regarding the "order of operation" in this case especially since step 7 has the caveat of (actual) being true before setting the bools. Considering the "other pah" through virHostdevReAttachPCIDevices, callers are: 1. libxlDomainAttachHostPCIDevice (an error: path) 2. libxlDomainDetachHostPCIDevice (normal path) 3. qemuHostdevReAttachPCIDevices (normal path) 4. virHostdevReAttachDomainDevices (libxl domain close processing) Focusing on qemuHostdevReAttachPCIDevices I find 3 callers: 1. qemuHostdevReAttachDomainDevices (called by qemuProcessStop) 2. qemuDomainAttachHostPCIDevice (hotplug error processing) 3. qemuDomainRemovePCIHostDevice (hot unplug processing). So while setting things up seems to be the right thing to do in the ReAttach case, the normal attach case it doesn't seem so. Perhaps too generic.

...

}

return pcidevs; @@ -1008,8 +1013,19 @@ virHostdevReAttachPCIDevices(virHostdevManagerPtr mgr, continue; } } else { - virPCIDeviceListDel(pcidevs, pci); - continue;

The above was added in commit 4cdbff3d5 by Andrea and just removed inactiveDev's from the @pcidevs since his change added both onto @pcidevs (at least that's my reading of it). But with your proposed change rather than deleting that from @pcidevs, you're moving it to the inactiveDev's list *if* there's a stub. So what happens if there isn't a stub? From this point forward @pcidevs would contain both active and inactive devices. Then step 2 comes along and runs the @pcidevs "stealing" to place the device on the inactiveDevs list. So now wouldn't we have duplicate devices there? Now if the new code went before the above 2 lines, then probably no big deal - that way we're adding back to the inactive list for these stub drivers and we're not keeping the device in @pcidevs meaning it wouldn't be readded to inactive afterwards. Hopefully this makes sense.

...

+ int stub = virPCIDeviceGetStubDriver(pci); + if (stub > VIR_PCI_STUB_DRIVER_NONE && + stub < VIR_PCI_STUB_DRIVER_LAST) { + /* The device is bound to a known stub driver: add a copy + * to the inactive list */ + VIR_DEBUG("Adding PCI device %s to inactive list",

s/%s/'%s'/ (both above and below)

...

+ virPCIDeviceGetName(pci)); + if (virPCIDeviceListAddCopy(mgr->inactivePCIHostdevs, pci) < 0) { + VIR_ERROR(_("Failed to add PCI device %s to the inactive list"), + virGetLastErrorMessage()); + virResetLastError(); + } + }

For reference, this is a copy of what's in virHostdevPreparePCIDevices step 2. I think you need to do some extra debugging and be sure you're not duplicating things. IOW: Add some debug code to dump the active and inactive lists. Although this is really long - I hope it all makes sense. John

...

}

i++;

Wuzongyong (Euler Dept)

20 Sep 20 Sep

2:35 a.m.

...

$SUBJ:

s/attatched/attached

s/bug//

On 08/31/2018 03:34 AM, Wu Zongyong wrote:

So, first off - I believe there are two things going on in this one patch. Even though there is "some relationship" between the issues, the libvirtd restart is kind of a corner case, while the change to readd the device to inactiveDev's would affect both this libvirtd restart case *and* the normal processing. It's taken me some time to come to that conclusion - lots of details follow, but I'm also willing to accept the changes are "related enough" for just one patch.

...
Currently, PCI devices will not be rebound to host drivers but attached to the stub driver when:

1) use libvirt to start a virtual machine with PCI devices assigned, then stop libvirtd process and shutdown the virtual machine. Finally, PCI devices are still bound to the stub driver instead of host drivers after libvirt start again. 2) use libvirt to shutdown a virtual machine wtih PCI devices assigned,

s/wiht/with

...
then stop libvirtd process before libvirt try to rebind PCI devices to

Be specific which API isn't being called - it really will help. Making a reviewer figure it out in very specific/particular cases such as this leads to "review delay". Is qemuHostdevReAttachDomainDevices called from qemuProcessStop what you were referring to? The Stop not getting called because the monitor channel is closed and thus the event for shutdown won't be triggered.

...
host drivers. Finally, PCI devices are still bound to the stub driver after libvirt start again.

Since you've done the research and so that I'm sure I'm following your logic - it seems like you're chasing corner cases based on libvirtd restart. In one instance the order is:

virsh start $dom service libvirtd stop (or some other means?) <shutdown> the guest from within the $dom

and the other instance is

virsh start @dom virsh destroy $dom service libvirtd stop

and issues as a result of the subsequent "service libvirtd start".

In the second instance it's not clear how your timing is "just right" such that libvirtd can be stopped before the PCI devices can be returned to the host PCI, but it's essentially the same as the first scenario at least with respect to the subsequent libvirtd restart and discovery that the guest is no longer running and needing to process that.

What may also have helped is if the commit message indicated the "normal" path taken that would allow the drivers to be rebound properly. Even if it's after the "---" for review backup assistance information.

In edge cases like this, it's far easier to cut out stuff from a commit message than it is to put yourself into the frame of reference of the patch submitter

Still what your patch does is use the virDomainHostdevOrigStatesPtr for a domain that was shutdown, but when libvirtd restarts. I had to go a long way back to find when the OrigStates code was added - commit d84b36263 in particular. In that case, the order was start domain, restart libvirtd, and destroy domain.

So the slight difference to the condition for this patch is the domain was shutdown while libvirtd was stopped.

In any case, all of the above is "one issue"...

Leaving the rest as a "second issue"

...
Notice that the comment on the top of virPCIDeviceDetach as follows:

activeDevs should be a list of all PCI devices currently in use by a domain.inactiveDevs is a list of all PCI devices that libvirt has

* activeDevs is a list ... * inactiveDevs is a list ...

...
detached from the host driver + attached to the stub driver, but hasn't yet assigned to a domain.

It's not reasonable that libvirt filter out devices that are either not active or not used by the current domain and driver. For devices belong

s/belong/that belong/

...
to domains that has been shutdown before libvirt start, we should put

s/has/have

s/libvirt start, we should put them/libvirtd starts, they should be placed

...
them into inactiveDevs if then meet the condition that PCI devices have

I'm finding it hard to read/parse/rewrite the rest of this sentence.

...
been detached from the host driver + attached to the stub driver.

Moreover, we set orignal states of PCI devices when build PCI devices

s/we set orignal/set the original/

s/build/building/

...
list in virHostdevGetPCIHostDeviceList if states has been set in struct virDomainHostdevDefPtr.

So the "second issue" (as I see it) is that ReAttach processing wasn't handling returning devices that had stub driver attachment (at some point in time). This issue occurs regardless of the libvirtd restart and domain shutdown processing order. Even though it's related, it's different and would need its own patch.

...
Signed-off-by: Wu Zongyong <cordius.wu@huawei.com> --- src/util/virhostdev.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/src/util/virhostdev.c b/src/util/virhostdev.c index ca79c37..ecf95e3 100644 --- a/src/util/virhostdev.c +++ b/src/util/virhostdev.c @@ -235,6 +235,7 @@

virHostdevGetPCIHostDeviceList(virDomainHostdevDefPtr *hostdevs, int nhostdevs)

...
for (i = 0; i < nhostdevs; i++) { virDomainHostdevDefPtr hostdev = hostdevs[i]; virDomainHostdevSubsysPCIPtr pcisrc = &hostdev->source.subsys.u.pci; + virDomainHostdevOrigStatesPtr origstates = + &hostdev->origstates; virPCIDevicePtr pci;

if (hostdev->mode != VIR_DOMAIN_HOSTDEV_MODE_SUBSYS) @@ -262,6 +263,10 @@ virHostdevGetPCIHostDeviceList(virDomainHostdevDefPtr

*hostdevs, int nhostdevs)

...
virPCIDeviceSetStubDriver(pci, VIR_PCI_STUB_DRIVER_XEN); else virPCIDeviceSetStubDriver(pci, VIR_PCI_STUB_DRIVER_KVM); + + virPCIDeviceSetUnbindFromStub(pci, origstates- states.pci.unbind_from_stub); + virPCIDeviceSetRemoveSlot(pci, origstates- states.pci.remove_slot); + virPCIDeviceSetReprobe(pci, origstates->states.pci.reprobe);

This looks a lot like what virHostdevUpdateActivePCIDevices is doing - in fact I think that is the libvirtd restart "bug" you're chasing - qemuProcessReconnect will call qemuHostdevUpdateActiveDomainDevices after the call to qemuConnectMonitor - at least w/r/t this PCI stub interaction.

Since the domain was stopped connecting to the monitor will fail (in my case qemuMonitorOpenUnix), thus falling into qemuProcessStop, but without having updated (or ReAttach'ing) the PCI devices. So, in your environment without these two changes - could you check to see what happens if the order is changed? Does that fix the issue?

That is move the call to qemuHostdevUpdateActiveDomainDevices to right after qemuProcessPrepareAllowReboot.

Some more "detailed" thoughts I had on this while investigating...

Both virHostdevPreparePCIDevices and virHostdevReAttachPCIDevices will call virHostdevGetPCIHostDeviceList. However, in the case of the former, "Step 7" is "set the original states for hostdev def".

Following call traces for virHostdevPreparePCIDevices eventually leads back to :

1. qemuMigrationDstPrepareAny (migration) 2. qemuProcessStart (guest startup) 3. qemuDomainAttachHostDevice (hotplug).

IOW: This is the "normal" running mode.

So I have a concern regarding the "order of operation" in this case especially since step 7 has the caveat of (actual) being true before setting the bools.

Considering the "other pah" through virHostdevReAttachPCIDevices, callers are:

1. libxlDomainAttachHostPCIDevice (an error: path) 2. libxlDomainDetachHostPCIDevice (normal path) 3. qemuHostdevReAttachPCIDevices (normal path) 4. virHostdevReAttachDomainDevices (libxl domain close processing)

Focusing on qemuHostdevReAttachPCIDevices I find 3 callers:

1. qemuHostdevReAttachDomainDevices (called by qemuProcessStop) 2. qemuDomainAttachHostPCIDevice (hotplug error processing) 3. qemuDomainRemovePCIHostDevice (hot unplug processing).

So while setting things up seems to be the right thing to do in the ReAttach case, the normal attach case it doesn't seem so. Perhaps too generic.

...
}

return pcidevs; @@ -1008,8 +1013,19 @@ virHostdevReAttachPCIDevices(virHostdevManagerPtr

mgr,

...
continue; } } else { - virPCIDeviceListDel(pcidevs, pci); - continue;

The above was added in commit 4cdbff3d5 by Andrea and just removed inactiveDev's from the @pcidevs since his change added both onto @pcidevs (at least that's my reading of it).

But with your proposed change rather than deleting that from @pcidevs, you're moving it to the inactiveDev's list *if* there's a stub.

So what happens if there isn't a stub? From this point forward @pcidevs would contain both active and inactive devices.

Then step 2 comes along and runs the @pcidevs "stealing" to place the device on the inactiveDevs list. So now wouldn't we have duplicate devices there?

Now if the new code went before the above 2 lines, then probably no big deal - that way we're adding back to the inactive list for these stub drivers and we're not keeping the device in @pcidevs meaning it wouldn't be readded to inactive afterwards.

Hopefully this makes sense.

...
+ int stub = virPCIDeviceGetStubDriver(pci); + if (stub > VIR_PCI_STUB_DRIVER_NONE && + stub < VIR_PCI_STUB_DRIVER_LAST) { + /* The device is bound to a known stub driver: add a copy + * to the inactive list */ + VIR_DEBUG("Adding PCI device %s to inactive list",

s/%s/'%s'/

(both above and below)

...
+ virPCIDeviceGetName(pci)); + if (virPCIDeviceListAddCopy(mgr->inactivePCIHostdevs, pci) < 0) { + VIR_ERROR(_("Failed to add PCI device %s to the inactive list"), + virGetLastErrorMessage()); + virResetLastError(); + } + }

For reference, this is a copy of what's in virHostdevPreparePCIDevices step 2. I think you need to do some extra debugging and be sure you're not duplicating things. IOW: Add some debug code to dump the active and inactive lists.

Although this is really long - I hope it all makes sense.

John Very sorry for the late, I have been occupied with a new project these days. Very sorry. Also thanks for correcting my poor English.

Back to the question, you suggested to move the call to qemuHostdevUpdateActiveDomainDevices to right after qemuProcessPrepareAllowReboot. Indeed, adjusting the order can fix the problem. So, maybe we don't need to talk other aspect You mentioned? All you mentioned really make sense and I don't consider so much indeed. Thanks, Zongyong Wu

John Ferlan

7:22 a.m.

On 09/20/2018 06:35 AM, Wuzongyong (Euler Dept) wrote:

...

...
$SUBJ:

s/attatched/attached

s/bug//

On 08/31/2018 03:34 AM, Wu Zongyong wrote:

So, first off - I believe there are two things going on in this one patch. Even though there is "some relationship" between the issues, the libvirtd restart is kind of a corner case, while the change to readd the device to inactiveDev's would affect both this libvirtd restart case *and* the normal processing. It's taken me some time to come to that conclusion - lots of details follow, but I'm also willing to accept the changes are "related enough" for just one patch.

...
Currently, PCI devices will not be rebound to host drivers but attached to the stub driver when:

1) use libvirt to start a virtual machine with PCI devices assigned, then stop libvirtd process and shutdown the virtual machine. Finally, PCI devices are still bound to the stub driver instead of host drivers after libvirt start again. 2) use libvirt to shutdown a virtual machine wtih PCI devices assigned,

s/wiht/with

...
then stop libvirtd process before libvirt try to rebind PCI devices to

Be specific which API isn't being called - it really will help. Making a reviewer figure it out in very specific/particular cases such as this leads to "review delay". Is qemuHostdevReAttachDomainDevices called from qemuProcessStop what you were referring to? The Stop not getting called because the monitor channel is closed and thus the event for shutdown won't be triggered.

...
host drivers. Finally, PCI devices are still bound to the stub driver after libvirt start again.

Since you've done the research and so that I'm sure I'm following your logic - it seems like you're chasing corner cases based on libvirtd restart. In one instance the order is:

virsh start $dom service libvirtd stop (or some other means?) <shutdown> the guest from within the $dom

and the other instance is

virsh start @dom virsh destroy $dom service libvirtd stop

and issues as a result of the subsequent "service libvirtd start".

In the second instance it's not clear how your timing is "just right" such that libvirtd can be stopped before the PCI devices can be returned to the host PCI, but it's essentially the same as the first scenario at least with respect to the subsequent libvirtd restart and discovery that the guest is no longer running and needing to process that.

What may also have helped is if the commit message indicated the "normal" path taken that would allow the drivers to be rebound properly. Even if it's after the "---" for review backup assistance information.

In edge cases like this, it's far easier to cut out stuff from a commit message than it is to put yourself into the frame of reference of the patch submitter

Still what your patch does is use the virDomainHostdevOrigStatesPtr for a domain that was shutdown, but when libvirtd restarts. I had to go a long way back to find when the OrigStates code was added - commit d84b36263 in particular. In that case, the order was start domain, restart libvirtd, and destroy domain.

So the slight difference to the condition for this patch is the domain was shutdown while libvirtd was stopped.

In any case, all of the above is "one issue"...

Leaving the rest as a "second issue"

...
Notice that the comment on the top of virPCIDeviceDetach as follows:

activeDevs should be a list of all PCI devices currently in use by a domain.inactiveDevs is a list of all PCI devices that libvirt has

* activeDevs is a list ... * inactiveDevs is a list ...

...
detached from the host driver + attached to the stub driver, but hasn't yet assigned to a domain.

It's not reasonable that libvirt filter out devices that are either not active or not used by the current domain and driver. For devices belong

s/belong/that belong/

...
to domains that has been shutdown before libvirt start, we should put

s/has/have

s/libvirt start, we should put them/libvirtd starts, they should be placed

...
them into inactiveDevs if then meet the condition that PCI devices have

I'm finding it hard to read/parse/rewrite the rest of this sentence.

...
been detached from the host driver + attached to the stub driver.

Moreover, we set orignal states of PCI devices when build PCI devices

s/we set orignal/set the original/

s/build/building/

...
list in virHostdevGetPCIHostDeviceList if states has been set in struct virDomainHostdevDefPtr.

So the "second issue" (as I see it) is that ReAttach processing wasn't handling returning devices that had stub driver attachment (at some point in time). This issue occurs regardless of the libvirtd restart and domain shutdown processing order. Even though it's related, it's different and would need its own patch.

...
Signed-off-by: Wu Zongyong <cordius.wu@huawei.com> --- src/util/virhostdev.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/src/util/virhostdev.c b/src/util/virhostdev.c index ca79c37..ecf95e3 100644 --- a/src/util/virhostdev.c +++ b/src/util/virhostdev.c @@ -235,6 +235,7 @@

virHostdevGetPCIHostDeviceList(virDomainHostdevDefPtr *hostdevs, int nhostdevs)

...
for (i = 0; i < nhostdevs; i++) { virDomainHostdevDefPtr hostdev = hostdevs[i]; virDomainHostdevSubsysPCIPtr pcisrc = &hostdev->source.subsys.u.pci; + virDomainHostdevOrigStatesPtr origstates = + &hostdev->origstates; virPCIDevicePtr pci;

if (hostdev->mode != VIR_DOMAIN_HOSTDEV_MODE_SUBSYS) @@ -262,6 +263,10 @@ virHostdevGetPCIHostDeviceList(virDomainHostdevDefPtr

*hostdevs, int nhostdevs)

...
virPCIDeviceSetStubDriver(pci, VIR_PCI_STUB_DRIVER_XEN); else virPCIDeviceSetStubDriver(pci, VIR_PCI_STUB_DRIVER_KVM); + + virPCIDeviceSetUnbindFromStub(pci, origstates- states.pci.unbind_from_stub); + virPCIDeviceSetRemoveSlot(pci, origstates- states.pci.remove_slot); + virPCIDeviceSetReprobe(pci, origstates->states.pci.reprobe);

This looks a lot like what virHostdevUpdateActivePCIDevices is doing - in fact I think that is the libvirtd restart "bug" you're chasing - qemuProcessReconnect will call qemuHostdevUpdateActiveDomainDevices after the call to qemuConnectMonitor - at least w/r/t this PCI stub interaction.

Since the domain was stopped connecting to the monitor will fail (in my case qemuMonitorOpenUnix), thus falling into qemuProcessStop, but without having updated (or ReAttach'ing) the PCI devices. So, in your environment without these two changes - could you check to see what happens if the order is changed? Does that fix the issue?

That is move the call to qemuHostdevUpdateActiveDomainDevices to right after qemuProcessPrepareAllowReboot.

Some more "detailed" thoughts I had on this while investigating...

Both virHostdevPreparePCIDevices and virHostdevReAttachPCIDevices will call virHostdevGetPCIHostDeviceList. However, in the case of the former, "Step 7" is "set the original states for hostdev def".

Following call traces for virHostdevPreparePCIDevices eventually leads back to :

1. qemuMigrationDstPrepareAny (migration) 2. qemuProcessStart (guest startup) 3. qemuDomainAttachHostDevice (hotplug).

IOW: This is the "normal" running mode.

So I have a concern regarding the "order of operation" in this case especially since step 7 has the caveat of (actual) being true before setting the bools.

Considering the "other pah" through virHostdevReAttachPCIDevices, callers are:

1. libxlDomainAttachHostPCIDevice (an error: path) 2. libxlDomainDetachHostPCIDevice (normal path) 3. qemuHostdevReAttachPCIDevices (normal path) 4. virHostdevReAttachDomainDevices (libxl domain close processing)

Focusing on qemuHostdevReAttachPCIDevices I find 3 callers:

1. qemuHostdevReAttachDomainDevices (called by qemuProcessStop) 2. qemuDomainAttachHostPCIDevice (hotplug error processing) 3. qemuDomainRemovePCIHostDevice (hot unplug processing).

So while setting things up seems to be the right thing to do in the ReAttach case, the normal attach case it doesn't seem so. Perhaps too generic.

...
}

return pcidevs; @@ -1008,8 +1013,19 @@ virHostdevReAttachPCIDevices(virHostdevManagerPtr

mgr,

...
continue; } } else { - virPCIDeviceListDel(pcidevs, pci); - continue;

The above was added in commit 4cdbff3d5 by Andrea and just removed inactiveDev's from the @pcidevs since his change added both onto @pcidevs (at least that's my reading of it).

But with your proposed change rather than deleting that from @pcidevs, you're moving it to the inactiveDev's list *if* there's a stub.

So what happens if there isn't a stub? From this point forward @pcidevs would contain both active and inactive devices.

Then step 2 comes along and runs the @pcidevs "stealing" to place the device on the inactiveDevs list. So now wouldn't we have duplicate devices there?

Now if the new code went before the above 2 lines, then probably no big deal - that way we're adding back to the inactive list for these stub drivers and we're not keeping the device in @pcidevs meaning it wouldn't be readded to inactive afterwards.

Hopefully this makes sense.

...
+ int stub = virPCIDeviceGetStubDriver(pci); + if (stub > VIR_PCI_STUB_DRIVER_NONE && + stub < VIR_PCI_STUB_DRIVER_LAST) { + /* The device is bound to a known stub driver: add a copy + * to the inactive list */ + VIR_DEBUG("Adding PCI device %s to inactive list",

s/%s/'%s'/

(both above and below)

...
+ virPCIDeviceGetName(pci)); + if (virPCIDeviceListAddCopy(mgr->inactivePCIHostdevs, pci) < 0) { + VIR_ERROR(_("Failed to add PCI device %s to the inactive list"), + virGetLastErrorMessage()); + virResetLastError(); + } + }

For reference, this is a copy of what's in virHostdevPreparePCIDevices step 2. I think you need to do some extra debugging and be sure you're not duplicating things. IOW: Add some debug code to dump the active and inactive lists.

Although this is really long - I hope it all makes sense.

John Very sorry for the late, I have been occupied with a new project these days. Very sorry. Also thanks for correcting my poor English.

No problem... We're all busy in different ways. The only hazard in delayed followups for me is short term memory problems ;-). The grammar/spelling is just part of the process - even people who are native english speakers aren't always "good" at it.

...

Back to the question, you suggested to move the call to qemuHostdevUpdateActiveDomainDevices to right after qemuProcessPrepareAllowReboot. Indeed, adjusting the order can fix the problem. So, maybe we don't need to talk other aspect You mentioned? All you mentioned really make sense and I don't consider so much indeed.

Thanks, Zongyong Wu

If moving the call works, then send a patch with just that! I thought "logistically" it would work, but didn't have the empirical evidence. If the "second issue" is no longer an issue, then fine we can just drop and forget about it. John

2722

Age (days ago)

2742

Last active (days ago)

List overview

Download

3 comments

3 participants

participants (3)

John Ferlan
Wu Zongyong
Wuzongyong (Euler Dept)