[libvirt] [PATCH] qemu: Do not reattach PCI device used by other domain when shutdown

When failing on starting a domain, it tries to reattach all the PCI devices defined in the domain conf, regardless of whether the devices are still used by other domain. This will cause the devices are deleted from the list qemu_driver->activePciHostdevs, thus the devices will be thought as usable even if it's not true. And following commands nodedev-{reattach,reset} will be successful. How to reproduce: 1) Define two domains with same PCI device defined in the confs. 2) # virsh start domain1 3) # virsh start domain2 4) # virsh nodedev-reattach $pci_device You will see the device will be reattached to host successfully. As pciDeviceReattach just check if the device is still used by other domain via checking if the device is in list driver->activePciHostdevs, however, the device is deleted from the list by step 2). This patch is to prohibit the bug by: 1) Prohibit a domain starting or device attachment right at preparation period (qemuPrepareHostdevPCIDevices) if the device is in list driver->activePciHostdevs, which means it's used by other domain. 2) Introduces a new field for struct _pciDevice, (char *used_by), it will be set as the domain name at preparation period, (qemuPrepareHostdevPCIDevices). Thus we can prohibit deleting the device from driver->activePciHostdevs if it's still used by other domain when stopping the domain process. * src/pci.h (define two internal functions, pciDeviceSetUsedBy and pciDevceGetUsedBy) * src/pci.c (new field "char *used_by" for struct _pciDevice, implementations for the two new functions) * src/libvirt_private.syms (Add the two new internal functions) * src/qemu_hostdev.h (Modify the definition of functions qemuPrepareHostdevPCIDevices, and qemuDomainReAttachHostdevDevices) * src/qemu_hostdev.c (Prohibit preparation and don't delete the device from activePciHostdevs list if it's still used by other domain) * src/qemu_hotplug.c (Update function usage, as the definitions are changed) --- src/libvirt_private.syms | 2 ++ src/qemu/qemu_hostdev.c | 31 ++++++++++++++++++++++++++++--- src/qemu/qemu_hostdev.h | 2 ++ src/qemu/qemu_hotplug.c | 4 ++-- src/util/pci.c | 22 ++++++++++++++++++++++ src/util/pci.h | 3 +++ 6 files changed, 59 insertions(+), 5 deletions(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 8235ea1..a5c5e6c 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -872,6 +872,7 @@ virNWFilterHashTableRemoveEntry; pciDettachDevice; pciDeviceFileIterate; pciDeviceGetManaged; +pciDeviceGetUsedBy; pciDeviceIsAssignable; pciDeviceIsVirtualFunction; pciDeviceListAdd; @@ -884,6 +885,7 @@ pciDeviceListSteal; pciDeviceNetName; pciDeviceReAttachInit; pciDeviceSetManaged; +pciDeviceSetUsedBy; pciFreeDevice; pciGetDevice; pciGetPhysicalFunction; diff --git a/src/qemu/qemu_hostdev.c b/src/qemu/qemu_hostdev.c index 6f77717..ef9e3b7 100644 --- a/src/qemu/qemu_hostdev.c +++ b/src/qemu/qemu_hostdev.c @@ -101,6 +101,7 @@ cleanup: int qemuPrepareHostdevPCIDevices(struct qemud_driver *driver, + const char *name, virDomainHostdevDefPtr *hostdevs, int nhostdevs) { @@ -126,7 +127,10 @@ int qemuPrepareHostdevPCIDevices(struct qemud_driver *driver, for (i = 0; i < pciDeviceListCount(pcidevs); i++) { pciDevice *dev = pciDeviceListGet(pcidevs, i); if (!pciDeviceIsAssignable(dev, !driver->relaxedACS)) - goto reattachdevs; + goto cleanup; + + if (pciDeviceListFind(driver->activePciHostdevs, dev)) + goto cleanup; if (pciDeviceGetManaged(dev) && pciDettachDevice(dev, driver->activePciHostdevs) < 0) @@ -156,6 +160,14 @@ int qemuPrepareHostdevPCIDevices(struct qemud_driver *driver, pciDeviceListSteal(pcidevs, dev); } + /* Now set the used_by_domain of the device in driver->activePciHostdevs + * as domain name. + */ + for (i = 0; i < pciDeviceListCount(driver->activePciHostdevs); i++) { + pciDevice * dev = pciDeviceListGet(driver->activePciHostdevs, i); + pciDeviceSetUsedBy(dev, name); + } + ret = 0; goto cleanup; @@ -183,7 +195,7 @@ static int qemuPrepareHostPCIDevices(struct qemud_driver *driver, virDomainDefPtr def) { - return qemuPrepareHostdevPCIDevices(driver, def->hostdevs, def->nhostdevs); + return qemuPrepareHostdevPCIDevices(driver, def->name, def->hostdevs, def->nhostdevs); } @@ -258,11 +270,13 @@ void qemuReattachPciDevice(pciDevice *dev, struct qemud_driver *driver) void qemuDomainReAttachHostdevDevices(struct qemud_driver *driver, + const char *name, virDomainHostdevDefPtr *hostdevs, int nhostdevs) { pciDeviceList *pcidevs; int i; + const char *used_by = NULL; if (!(pcidevs = qemuGetPciHostDeviceList(hostdevs, nhostdevs))) { virErrorPtr err = virGetLastError(); @@ -277,6 +291,17 @@ void qemuDomainReAttachHostdevDevices(struct qemud_driver *driver, for (i = 0; i < pciDeviceListCount(pcidevs); i++) { pciDevice *dev = pciDeviceListGet(pcidevs, i); + pciDevice *activeDev = NULL; + + /* Never delete the dev from list driver->activePciHostdevs + * if it's used by other domain. + */ + activeDev = pciDeviceListFind(driver->activePciHostdevs, dev); + if (activeDev && + (used_by = pciDeviceGetUsedBy(activeDev)) && + STRNEQ(used_by, name)) + continue; + pciDeviceListDel(driver->activePciHostdevs, dev); } @@ -305,5 +330,5 @@ void qemuDomainReAttachHostDevices(struct qemud_driver *driver, if (!def->nhostdevs) return; - qemuDomainReAttachHostdevDevices(driver, def->hostdevs, def->nhostdevs); + qemuDomainReAttachHostdevDevices(driver, def->name, def->hostdevs, def->nhostdevs); } diff --git a/src/qemu/qemu_hostdev.h b/src/qemu/qemu_hostdev.h index 1f3d1bc..07d7de2 100644 --- a/src/qemu/qemu_hostdev.h +++ b/src/qemu/qemu_hostdev.h @@ -30,12 +30,14 @@ int qemuUpdateActivePciHostdevs(struct qemud_driver *driver, virDomainDefPtr def); int qemuPrepareHostdevPCIDevices(struct qemud_driver *driver, + const char *name, virDomainHostdevDefPtr *hostdevs, int nhostdevs); int qemuPrepareHostDevices(struct qemud_driver *driver, virDomainDefPtr def); void qemuReattachPciDevice(pciDevice *dev, struct qemud_driver *driver); void qemuDomainReAttachHostdevDevices(struct qemud_driver *driver, + const char *name, virDomainHostdevDefPtr *hostdevs, int nhostdevs); void qemuDomainReAttachHostDevices(struct qemud_driver *driver, diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 6cfe392..dc920e7 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -859,7 +859,7 @@ int qemuDomainAttachHostPciDevice(struct qemud_driver *driver, return -1; } - if (qemuPrepareHostdevPCIDevices(driver, &hostdev, 1) < 0) + if (qemuPrepareHostdevPCIDevices(driver, vm->def->name, &hostdev, 1) < 0) return -1; if (qemuCapsGet(priv->qemuCaps, QEMU_CAPS_DEVICE)) { @@ -925,7 +925,7 @@ error: hostdev->info.addr.pci.slot) < 0) VIR_WARN("Unable to release PCI address on host device"); - qemuDomainReAttachHostdevDevices(driver, &hostdev, 1); + qemuDomainReAttachHostdevDevices(driver, vm->def->name, &hostdev, 1); VIR_FREE(devstr); VIR_FREE(configfd_name); diff --git a/src/util/pci.c b/src/util/pci.c index 8d8e157..38548c7 100644 --- a/src/util/pci.c +++ b/src/util/pci.c @@ -62,6 +62,7 @@ struct _pciDevice { char name[PCI_ADDR_LEN]; /* domain:bus:slot.function */ char id[PCI_ID_LEN]; /* product vendor */ char *path; + char *used_by; /* The domain which uses the device */ int fd; unsigned initted; @@ -1312,6 +1313,7 @@ pciGetDevice(unsigned domain, dev->bus = bus; dev->slot = slot; dev->function = function; + dev->used_by = NULL; if (snprintf(dev->name, sizeof(dev->name), "%.4x:%.2x:%.2x.%.1x", dev->domain, dev->bus, dev->slot, @@ -1374,6 +1376,7 @@ pciFreeDevice(pciDevice *dev) VIR_DEBUG("%s %s: freeing", dev->id, dev->name); pciCloseConfig(dev); VIR_FREE(dev->path); + VIR_FREE(dev->used_by); VIR_FREE(dev); } @@ -1387,6 +1390,25 @@ unsigned pciDeviceGetManaged(pciDevice *dev) return dev->managed; } +int +pciDeviceSetUsedBy(pciDevice *dev, const char *name) +{ + dev->used_by = strdup(name); + + if (!dev->used_by) { + virReportOOMError(); + return -1; + } + + return 0; +} + +const char * +pciDeviceGetUsedBy(pciDevice *dev) +{ + return dev->used_by; +} + void pciDeviceReAttachInit(pciDevice *pci) { pci->unbind_from_stub = 1; diff --git a/src/util/pci.h b/src/util/pci.h index a1600fe..c9d8227 100644 --- a/src/util/pci.h +++ b/src/util/pci.h @@ -47,6 +47,9 @@ int pciResetDevice (pciDevice *dev, void pciDeviceSetManaged(pciDevice *dev, unsigned managed); unsigned pciDeviceGetManaged(pciDevice *dev); +int pciDeviceSetUsedBy(pciDevice *dev, + const char *used_by); +const char *pciDeviceGetUsedBy(pciDevice *dev); void pciDeviceReAttachInit(pciDevice *dev); pciDeviceList *pciDeviceListNew (void); -- 1.7.6

Anybody could help review this? Thanks Osier 于 2011年09月27日 14:53, Osier Yang 写道:
When failing on starting a domain, it tries to reattach all the PCI devices defined in the domain conf, regardless of whether the devices are still used by other domain. This will cause the devices are deleted from the list qemu_driver->activePciHostdevs, thus the devices will be thought as usable even if it's not true. And following commands nodedev-{reattach,reset} will be successful.
How to reproduce: 1) Define two domains with same PCI device defined in the confs. 2) # virsh start domain1 3) # virsh start domain2 4) # virsh nodedev-reattach $pci_device
You will see the device will be reattached to host successfully. As pciDeviceReattach just check if the device is still used by other domain via checking if the device is in list driver->activePciHostdevs, however, the device is deleted from the list by step 2).
This patch is to prohibit the bug by: 1) Prohibit a domain starting or device attachment right at preparation period (qemuPrepareHostdevPCIDevices) if the device is in list driver->activePciHostdevs, which means it's used by other domain.
2) Introduces a new field for struct _pciDevice, (char *used_by), it will be set as the domain name at preparation period, (qemuPrepareHostdevPCIDevices). Thus we can prohibit deleting the device from driver->activePciHostdevs if it's still used by other domain when stopping the domain process.
* src/pci.h (define two internal functions, pciDeviceSetUsedBy and pciDevceGetUsedBy) * src/pci.c (new field "char *used_by" for struct _pciDevice, implementations for the two new functions) * src/libvirt_private.syms (Add the two new internal functions) * src/qemu_hostdev.h (Modify the definition of functions qemuPrepareHostdevPCIDevices, and qemuDomainReAttachHostdevDevices) * src/qemu_hostdev.c (Prohibit preparation and don't delete the device from activePciHostdevs list if it's still used by other domain) * src/qemu_hotplug.c (Update function usage, as the definitions are changed) --- src/libvirt_private.syms | 2 ++ src/qemu/qemu_hostdev.c | 31 ++++++++++++++++++++++++++++--- src/qemu/qemu_hostdev.h | 2 ++ src/qemu/qemu_hotplug.c | 4 ++-- src/util/pci.c | 22 ++++++++++++++++++++++ src/util/pci.h | 3 +++ 6 files changed, 59 insertions(+), 5 deletions(-)
diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 8235ea1..a5c5e6c 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -872,6 +872,7 @@ virNWFilterHashTableRemoveEntry; pciDettachDevice; pciDeviceFileIterate; pciDeviceGetManaged; +pciDeviceGetUsedBy; pciDeviceIsAssignable; pciDeviceIsVirtualFunction; pciDeviceListAdd; @@ -884,6 +885,7 @@ pciDeviceListSteal; pciDeviceNetName; pciDeviceReAttachInit; pciDeviceSetManaged; +pciDeviceSetUsedBy; pciFreeDevice; pciGetDevice; pciGetPhysicalFunction; diff --git a/src/qemu/qemu_hostdev.c b/src/qemu/qemu_hostdev.c index 6f77717..ef9e3b7 100644 --- a/src/qemu/qemu_hostdev.c +++ b/src/qemu/qemu_hostdev.c @@ -101,6 +101,7 @@ cleanup:
int qemuPrepareHostdevPCIDevices(struct qemud_driver *driver, + const char *name, virDomainHostdevDefPtr *hostdevs, int nhostdevs) { @@ -126,7 +127,10 @@ int qemuPrepareHostdevPCIDevices(struct qemud_driver *driver, for (i = 0; i < pciDeviceListCount(pcidevs); i++) { pciDevice *dev = pciDeviceListGet(pcidevs, i); if (!pciDeviceIsAssignable(dev, !driver->relaxedACS)) - goto reattachdevs; + goto cleanup; + + if (pciDeviceListFind(driver->activePciHostdevs, dev)) + goto cleanup;
if (pciDeviceGetManaged(dev) && pciDettachDevice(dev, driver->activePciHostdevs) < 0) @@ -156,6 +160,14 @@ int qemuPrepareHostdevPCIDevices(struct qemud_driver *driver, pciDeviceListSteal(pcidevs, dev); }
+ /* Now set the used_by_domain of the device in driver->activePciHostdevs + * as domain name. + */ + for (i = 0; i < pciDeviceListCount(driver->activePciHostdevs); i++) { + pciDevice * dev = pciDeviceListGet(driver->activePciHostdevs, i); + pciDeviceSetUsedBy(dev, name); + } + ret = 0; goto cleanup;
@@ -183,7 +195,7 @@ static int qemuPrepareHostPCIDevices(struct qemud_driver *driver, virDomainDefPtr def) { - return qemuPrepareHostdevPCIDevices(driver, def->hostdevs, def->nhostdevs); + return qemuPrepareHostdevPCIDevices(driver, def->name, def->hostdevs, def->nhostdevs); }
@@ -258,11 +270,13 @@ void qemuReattachPciDevice(pciDevice *dev, struct qemud_driver *driver)
void qemuDomainReAttachHostdevDevices(struct qemud_driver *driver, + const char *name, virDomainHostdevDefPtr *hostdevs, int nhostdevs) { pciDeviceList *pcidevs; int i; + const char *used_by = NULL;
if (!(pcidevs = qemuGetPciHostDeviceList(hostdevs, nhostdevs))) { virErrorPtr err = virGetLastError(); @@ -277,6 +291,17 @@ void qemuDomainReAttachHostdevDevices(struct qemud_driver *driver,
for (i = 0; i < pciDeviceListCount(pcidevs); i++) { pciDevice *dev = pciDeviceListGet(pcidevs, i); + pciDevice *activeDev = NULL; + + /* Never delete the dev from list driver->activePciHostdevs + * if it's used by other domain. + */ + activeDev = pciDeviceListFind(driver->activePciHostdevs, dev); + if (activeDev && + (used_by = pciDeviceGetUsedBy(activeDev)) && + STRNEQ(used_by, name)) + continue; + pciDeviceListDel(driver->activePciHostdevs, dev); }
@@ -305,5 +330,5 @@ void qemuDomainReAttachHostDevices(struct qemud_driver *driver, if (!def->nhostdevs) return;
- qemuDomainReAttachHostdevDevices(driver, def->hostdevs, def->nhostdevs); + qemuDomainReAttachHostdevDevices(driver, def->name, def->hostdevs, def->nhostdevs); } diff --git a/src/qemu/qemu_hostdev.h b/src/qemu/qemu_hostdev.h index 1f3d1bc..07d7de2 100644 --- a/src/qemu/qemu_hostdev.h +++ b/src/qemu/qemu_hostdev.h @@ -30,12 +30,14 @@ int qemuUpdateActivePciHostdevs(struct qemud_driver *driver, virDomainDefPtr def); int qemuPrepareHostdevPCIDevices(struct qemud_driver *driver, + const char *name, virDomainHostdevDefPtr *hostdevs, int nhostdevs); int qemuPrepareHostDevices(struct qemud_driver *driver, virDomainDefPtr def); void qemuReattachPciDevice(pciDevice *dev, struct qemud_driver *driver); void qemuDomainReAttachHostdevDevices(struct qemud_driver *driver, + const char *name, virDomainHostdevDefPtr *hostdevs, int nhostdevs); void qemuDomainReAttachHostDevices(struct qemud_driver *driver, diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 6cfe392..dc920e7 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -859,7 +859,7 @@ int qemuDomainAttachHostPciDevice(struct qemud_driver *driver, return -1; }
- if (qemuPrepareHostdevPCIDevices(driver, &hostdev, 1) < 0) + if (qemuPrepareHostdevPCIDevices(driver, vm->def->name, &hostdev, 1) < 0) return -1;
if (qemuCapsGet(priv->qemuCaps, QEMU_CAPS_DEVICE)) { @@ -925,7 +925,7 @@ error: hostdev->info.addr.pci.slot) < 0) VIR_WARN("Unable to release PCI address on host device");
- qemuDomainReAttachHostdevDevices(driver, &hostdev, 1); + qemuDomainReAttachHostdevDevices(driver, vm->def->name, &hostdev, 1);
VIR_FREE(devstr); VIR_FREE(configfd_name); diff --git a/src/util/pci.c b/src/util/pci.c index 8d8e157..38548c7 100644 --- a/src/util/pci.c +++ b/src/util/pci.c @@ -62,6 +62,7 @@ struct _pciDevice { char name[PCI_ADDR_LEN]; /* domain:bus:slot.function */ char id[PCI_ID_LEN]; /* product vendor */ char *path; + char *used_by; /* The domain which uses the device */ int fd;
unsigned initted; @@ -1312,6 +1313,7 @@ pciGetDevice(unsigned domain, dev->bus = bus; dev->slot = slot; dev->function = function; + dev->used_by = NULL;
if (snprintf(dev->name, sizeof(dev->name), "%.4x:%.2x:%.2x.%.1x", dev->domain, dev->bus, dev->slot, @@ -1374,6 +1376,7 @@ pciFreeDevice(pciDevice *dev) VIR_DEBUG("%s %s: freeing", dev->id, dev->name); pciCloseConfig(dev); VIR_FREE(dev->path); + VIR_FREE(dev->used_by); VIR_FREE(dev); }
@@ -1387,6 +1390,25 @@ unsigned pciDeviceGetManaged(pciDevice *dev) return dev->managed; }
+int +pciDeviceSetUsedBy(pciDevice *dev, const char *name) +{ + dev->used_by = strdup(name); + + if (!dev->used_by) { + virReportOOMError(); + return -1; + } + + return 0; +} + +const char * +pciDeviceGetUsedBy(pciDevice *dev) +{ + return dev->used_by; +} + void pciDeviceReAttachInit(pciDevice *pci) { pci->unbind_from_stub = 1; diff --git a/src/util/pci.h b/src/util/pci.h index a1600fe..c9d8227 100644 --- a/src/util/pci.h +++ b/src/util/pci.h @@ -47,6 +47,9 @@ int pciResetDevice (pciDevice *dev, void pciDeviceSetManaged(pciDevice *dev, unsigned managed); unsigned pciDeviceGetManaged(pciDevice *dev); +int pciDeviceSetUsedBy(pciDevice *dev, + const char *used_by); +const char *pciDeviceGetUsedBy(pciDevice *dev); void pciDeviceReAttachInit(pciDevice *dev);
pciDeviceList *pciDeviceListNew (void);

On 09/27/2011 12:53 AM, Osier Yang wrote: Apologies on the delayed review. This is some hairy code, and I want to make sure we get it right, so I kind of shelved it knowing it would be a longer review.
When failing on starting a domain, it tries to reattach all the PCI devices defined in the domain conf, regardless of whether the devices are still used by other domain. This will cause the devices are deleted
s/are deleted/to be deleted/
from the list qemu_driver->activePciHostdevs, thus the devices will be thought as usable even if it's not true. And following commands nodedev-{reattach,reset} will be successful.
How to reproduce: 1) Define two domains with same PCI device defined in the confs. 2) # virsh start domain1 3) # virsh start domain2 4) # virsh nodedev-reattach $pci_device
You will see the device will be reattached to host successfully. As pciDeviceReattach just check if the device is still used by other domain via checking if the device is in list driver->activePciHostdevs, however, the device is deleted from the list by step 2).
Ouch, and definitely needs patching.
This patch is to prohibit the bug by: 1) Prohibit a domain starting or device attachment right at preparation period (qemuPrepareHostdevPCIDevices) if the device is in list driver->activePciHostdevs, which means it's used by other domain.
Off-hand, I'm not sure if this is quite right - it is completely valid to have two persistent domains both using the same hostdev, _as long as_ you guarantee that at most one of those two domains is active at a time. Is qemuPrepareHostdevPCIDevices called at define time (when the persistent xml is stored) or at start time (when actually starting the guest)? [/me goes and looks...] Yay! qemuPrepareHostdevPCIDevices is only called during hot-plug (domain already active) and by qemuPrepareHostDevices(), which in turn is only used in qemuProcessStart. So it does indeed look like this patch merely prohibits starting domain2 if domain1 is running in your above example, but does not prohibit both definitions from existing.
2) Introduces a new field for struct _pciDevice, (char *used_by), it will be set as the domain name at preparation period, (qemuPrepareHostdevPCIDevices). Thus we can prohibit deleting the device from driver->activePciHostdevs if it's still used by other domain when stopping the domain process.
* src/pci.h (define two internal functions, pciDeviceSetUsedBy and pciDevceGetUsedBy) * src/pci.c (new field "char *used_by" for struct _pciDevice, implementations for the two new functions) * src/libvirt_private.syms (Add the two new internal functions) * src/qemu_hostdev.h (Modify the definition of functions qemuPrepareHostdevPCIDevices, and qemuDomainReAttachHostdevDevices) * src/qemu_hostdev.c (Prohibit preparation and don't delete the device from activePciHostdevs list if it's still used by other domain) * src/qemu_hotplug.c (Update function usage, as the definitions are changed) --- src/libvirt_private.syms | 2 ++ src/qemu/qemu_hostdev.c | 31 ++++++++++++++++++++++++++++--- src/qemu/qemu_hostdev.h | 2 ++ src/qemu/qemu_hotplug.c | 4 ++-- src/util/pci.c | 22 ++++++++++++++++++++++ src/util/pci.h | 3 +++ 6 files changed, 59 insertions(+), 5 deletions(-)
@@ -126,7 +127,10 @@ int qemuPrepareHostdevPCIDevices(struct qemud_driver *driver, for (i = 0; i< pciDeviceListCount(pcidevs); i++) { pciDevice *dev = pciDeviceListGet(pcidevs, i); if (!pciDeviceIsAssignable(dev, !driver->relaxedACS)) - goto reattachdevs; + goto cleanup;
This doesn't seem right, when there are multiple devices. Pre-patch, suppose we had: dev1 - is assignable, is managed, and detach succeeds (nevermind the spelling error in pciDettachDevice) dev2 - is not assignable goto reattachdevs dev1 - is managed, so it is reattached Now we have: dev1 - is assignable, is not active, and detach succeeds dev2 - is not assignable goto cleanup Oops - we left dev1 detached. I think you _have_ to break this into two separate loops (and adjust the line earlier about 4 loops to now call out 5 loops!). Loop 1 - prove that all desired devices are assignable and not already active Loop 2 - for all managed devices, detach them
+ + if (pciDeviceListFind(driver->activePciHostdevs, dev)) + goto cleanup;
This is correct for loop 1, validating that the requested device is not already in use by another active domain.
if (pciDeviceGetManaged(dev)&& pciDettachDevice(dev, driver->activePciHostdevs)< 0) @@ -156,6 +160,14 @@ int qemuPrepareHostdevPCIDevices(struct qemud_driver *driver, pciDeviceListSteal(pcidevs, dev); }
+ /* Now set the used_by_domain of the device in driver->activePciHostdevs + * as domain name. + */ + for (i = 0; i< pciDeviceListCount(driver->activePciHostdevs); i++) { + pciDevice * dev = pciDeviceListGet(driver->activePciHostdevs, i);
Formatting: s/* dev/*dev/ This iterates over all active pci devices, but it _should_ be iterating only over the pci devices in use by the given domain. Otherwise, you have: dom1 - takes dev1, dev1 in use by "dom1" dom2 - takes dev2, now dev1 and dev2 both in use by "dom2"
+ pciDeviceSetUsedBy(dev, name);
Ouch. pciDeviceSetUsedBy is malloc'ing memory, but you aren't checking for failure. More on this below...
@@ -277,6 +291,17 @@ void qemuDomainReAttachHostdevDevices(struct qemud_driver *driver,
for (i = 0; i< pciDeviceListCount(pcidevs); i++) { pciDevice *dev = pciDeviceListGet(pcidevs, i); + pciDevice *activeDev = NULL; + + /* Never delete the dev from list driver->activePciHostdevs + * if it's used by other domain. + */ + activeDev = pciDeviceListFind(driver->activePciHostdevs, dev); + if (activeDev&& + (used_by = pciDeviceGetUsedBy(activeDev))&& + STRNEQ(used_by, name))
I would have simplified these two conditions to one line: STRNEQ(name, pciDeviceGetUsedBy(activeDev))
+++ b/src/util/pci.c @@ -62,6 +62,7 @@ struct _pciDevice { char name[PCI_ADDR_LEN]; /* domain:bus:slot.function */ char id[PCI_ID_LEN]; /* product vendor */ char *path; + char *used_by; /* The domain which uses the device */
As promised above, why not make this 'const char *used_by', since a domain can only own a device for as long as it is active, and as long as the domain is active, then the domain's name will be in scope. Then, you don't have to strdup() the name, and don't have to worry about malloc failure. But you _do_ have to worry about setting the name back to NULL after a domain goes inactive or the device is hotplugged, to update the fact that no active domain owns the device.
int fd;
unsigned initted; @@ -1312,6 +1313,7 @@ pciGetDevice(unsigned domain, dev->bus = bus; dev->slot = slot; dev->function = function; + dev->used_by = NULL;
if (snprintf(dev->name, sizeof(dev->name), "%.4x:%.2x:%.2x.%.1x", dev->domain, dev->bus, dev->slot, @@ -1374,6 +1376,7 @@ pciFreeDevice(pciDevice *dev) VIR_DEBUG("%s %s: freeing", dev->id, dev->name); pciCloseConfig(dev); VIR_FREE(dev->path); + VIR_FREE(dev->used_by);
If you use my suggestion of pointing to the pre-existing domain name, rather than strdup(), then you don't want this line.
VIR_FREE(dev); }
@@ -1387,6 +1390,25 @@ unsigned pciDeviceGetManaged(pciDevice *dev) return dev->managed; }
+int +pciDeviceSetUsedBy(pciDevice *dev, const char *name) +{ + dev->used_by = strdup(name); + + if (!dev->used_by) { + virReportOOMError(); + return -1; + } + + return 0;
and here, you could make the function return void (straight pointer assignment, rather than strdup()). Needs a v2, but we should definitely get this patched. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

于 2011年10月13日 00:41, Eric Blake 写道:
On 09/27/2011 12:53 AM, Osier Yang wrote:
Apologies on the delayed review. This is some hairy code, and I want to make sure we get it right, so I kind of shelved it knowing it would be a longer review.
When failing on starting a domain, it tries to reattach all the PCI devices defined in the domain conf, regardless of whether the devices are still used by other domain. This will cause the devices are deleted
s/are deleted/to be deleted/
from the list qemu_driver->activePciHostdevs, thus the devices will be thought as usable even if it's not true. And following commands nodedev-{reattach,reset} will be successful.
How to reproduce: 1) Define two domains with same PCI device defined in the confs. 2) # virsh start domain1 3) # virsh start domain2 4) # virsh nodedev-reattach $pci_device
You will see the device will be reattached to host successfully. As pciDeviceReattach just check if the device is still used by other domain via checking if the device is in list driver->activePciHostdevs, however, the device is deleted from the list by step 2).
Ouch, and definitely needs patching.
This patch is to prohibit the bug by: 1) Prohibit a domain starting or device attachment right at preparation period (qemuPrepareHostdevPCIDevices) if the device is in list driver->activePciHostdevs, which means it's used by other domain.
Off-hand, I'm not sure if this is quite right - it is completely valid to have two persistent domains both using the same hostdev, _as long as_ you guarantee that at most one of those two domains is active at a time. Is qemuPrepareHostdevPCIDevices called at define time (when the persistent xml is stored) or at start time (when actually starting the guest)?
[/me goes and looks...]
Yay! qemuPrepareHostdevPCIDevices is only called during hot-plug (domain already active) and by qemuPrepareHostDevices(), which in turn is only used in qemuProcessStart. So it does indeed look like this patch merely prohibits starting domain2 if domain1 is running in your above example, but does not prohibit both definitions from existing.
Yes, I checked this too when doing the patch.
2) Introduces a new field for struct _pciDevice, (char *used_by), it will be set as the domain name at preparation period, (qemuPrepareHostdevPCIDevices). Thus we can prohibit deleting the device from driver->activePciHostdevs if it's still used by other domain when stopping the domain process.
* src/pci.h (define two internal functions, pciDeviceSetUsedBy and pciDevceGetUsedBy) * src/pci.c (new field "char *used_by" for struct _pciDevice, implementations for the two new functions) * src/libvirt_private.syms (Add the two new internal functions) * src/qemu_hostdev.h (Modify the definition of functions qemuPrepareHostdevPCIDevices, and qemuDomainReAttachHostdevDevices) * src/qemu_hostdev.c (Prohibit preparation and don't delete the device from activePciHostdevs list if it's still used by other domain) * src/qemu_hotplug.c (Update function usage, as the definitions are changed) --- src/libvirt_private.syms | 2 ++ src/qemu/qemu_hostdev.c | 31 ++++++++++++++++++++++++++++--- src/qemu/qemu_hostdev.h | 2 ++ src/qemu/qemu_hotplug.c | 4 ++-- src/util/pci.c | 22 ++++++++++++++++++++++ src/util/pci.h | 3 +++ 6 files changed, 59 insertions(+), 5 deletions(-)
@@ -126,7 +127,10 @@ int qemuPrepareHostdevPCIDevices(struct qemud_driver *driver, for (i = 0; i< pciDeviceListCount(pcidevs); i++) { pciDevice *dev = pciDeviceListGet(pcidevs, i); if (!pciDeviceIsAssignable(dev, !driver->relaxedACS)) - goto reattachdevs; + goto cleanup;
This doesn't seem right, when there are multiple devices.
Pre-patch, suppose we had: dev1 - is assignable, is managed, and detach succeeds (nevermind the spelling error in pciDettachDevice) dev2 - is not assignable goto reattachdevs dev1 - is managed, so it is reattached
Now we have: dev1 - is assignable, is not active, and detach succeeds dev2 - is not assignable goto cleanup
Oops - we left dev1 detached.
I think you _have_ to break this into two separate loops (and adjust the line earlier about 4 loops to now call out 5 loops!).
Loop 1 - prove that all desired devices are assignable and not already active
Loop 2 - for all managed devices, detach them
Oops, I ignored in the loop there is also pciDeviceDetach. And yes, breaking them into 2 loops is good.
+ + if (pciDeviceListFind(driver->activePciHostdevs, dev)) + goto cleanup;
This is correct for loop 1, validating that the requested device is not already in use by another active domain.
if (pciDeviceGetManaged(dev)&& pciDettachDevice(dev, driver->activePciHostdevs)< 0) @@ -156,6 +160,14 @@ int qemuPrepareHostdevPCIDevices(struct qemud_driver *driver, pciDeviceListSteal(pcidevs, dev); }
+ /* Now set the used_by_domain of the device in driver->activePciHostdevs + * as domain name. + */ + for (i = 0; i< pciDeviceListCount(driver->activePciHostdevs); i++) { + pciDevice * dev = pciDeviceListGet(driver->activePciHostdevs, i);
Formatting: s/* dev/*dev/
This iterates over all active pci devices, but it _should_ be iterating only over the pci devices in use by the given domain. Otherwise, you have:
dom1 - takes dev1, dev1 in use by "dom1" dom2 - takes dev2, now dev1 and dev2 both in use by "dom2"
Ah, right. I was blind on this with testing on just one domain.
+ pciDeviceSetUsedBy(dev, name);
Ouch. pciDeviceSetUsedBy is malloc'ing memory, but you aren't checking for failure. More on this below...
okay
@@ -277,6 +291,17 @@ void qemuDomainReAttachHostdevDevices(struct qemud_driver *driver,
for (i = 0; i< pciDeviceListCount(pcidevs); i++) { pciDevice *dev = pciDeviceListGet(pcidevs, i); + pciDevice *activeDev = NULL; + + /* Never delete the dev from list driver->activePciHostdevs + * if it's used by other domain. + */ + activeDev = pciDeviceListFind(driver->activePciHostdevs, dev); + if (activeDev&& + (used_by = pciDeviceGetUsedBy(activeDev))&& + STRNEQ(used_by, name))
I would have simplified these two conditions to one line:
STRNEQ(name, pciDeviceGetUsedBy(activeDev))
Okay, make sense.
+++ b/src/util/pci.c @@ -62,6 +62,7 @@ struct _pciDevice { char name[PCI_ADDR_LEN]; /* domain:bus:slot.function */ char id[PCI_ID_LEN]; /* product vendor */ char *path; + char *used_by; /* The domain which uses the device */
As promised above, why not make this 'const char *used_by', since a domain can only own a device for as long as it is active, and as long as the domain is active, then the domain's name will be in scope. Then, you don't have to strdup() the name, and don't have to worry about malloc failure. But you _do_ have to worry about setting the name back to NULL after a domain goes inactive or the device is hotplugged, to update the fact that no active domain owns the device.
Sounds good, making a v2.
int fd;
unsigned initted; @@ -1312,6 +1313,7 @@ pciGetDevice(unsigned domain, dev->bus = bus; dev->slot = slot; dev->function = function; + dev->used_by = NULL;
if (snprintf(dev->name, sizeof(dev->name), "%.4x:%.2x:%.2x.%.1x", dev->domain, dev->bus, dev->slot, @@ -1374,6 +1376,7 @@ pciFreeDevice(pciDevice *dev) VIR_DEBUG("%s %s: freeing", dev->id, dev->name); pciCloseConfig(dev); VIR_FREE(dev->path); + VIR_FREE(dev->used_by);
If you use my suggestion of pointing to the pre-existing domain name, rather than strdup(), then you don't want this line.
VIR_FREE(dev); }
@@ -1387,6 +1390,25 @@ unsigned pciDeviceGetManaged(pciDevice *dev) return dev->managed; }
+int +pciDeviceSetUsedBy(pciDevice *dev, const char *name) +{ + dev->used_by = strdup(name); + + if (!dev->used_by) { + virReportOOMError(); + return -1; + } + + return 0;
and here, you could make the function return void (straight pointer assignment, rather than strdup()).
Needs a v2, but we should definitely get this patched.
participants (2)
-
Eric Blake
-
Osier Yang