[libvirt] [PATCH] qemu: Fix race between device rebind and kvm cleanup
by Chris Lalancette
Certain hypervisors (like qemu/kvm) map the PCI bar(s) on
the host when doing device passthrough. This can lead to a race
condition where the hypervisor is still cleaning up the device while
libvirt is trying to re-attach it to the host device driver. To avoid
this situation, we look through /proc/iomem, and if the hypervisor is
still holding onto the bar (denoted by the string in the matcher variable),
then we can wait around a bit for that to clear up.
Signed-off-by: Chris Lalancette <clalance(a)redhat.com>
---
src/libvirt_private.syms | 1 +
src/qemu/qemu_driver.c | 19 ++++++---
src/util/pci.c | 97 ++++++++++++++++++++++++++++++++++++++++++++++
src/util/pci.h | 1 +
4 files changed, 112 insertions(+), 6 deletions(-)
diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms
index c912d81..e42c090 100644
--- a/src/libvirt_private.syms
+++ b/src/libvirt_private.syms
@@ -439,6 +439,7 @@ pciGetDevice;
pciFreeDevice;
pciDettachDevice;
pciReAttachDevice;
+pciWaitForDeviceCleanup;
pciResetDevice;
pciDeviceSetManaged;
pciDeviceGetManaged;
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index e14ed8f..cde0034 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -2244,6 +2244,7 @@ qemuDomainReAttachHostDevices(virConnectPtr conn,
{
pciDeviceList *pcidevs;
int i;
+ int retries = 100;
if (!def->nhostdevs)
return;
@@ -2277,12 +2278,18 @@ qemuDomainReAttachHostDevices(virConnectPtr conn,
for (i = 0; i < pciDeviceListCount(pcidevs); i++) {
pciDevice *dev = pciDeviceListGet(pcidevs, i);
- if (pciDeviceGetManaged(dev) &&
- pciReAttachDevice(NULL, dev) < 0) {
- virErrorPtr err = virGetLastError();
- VIR_ERROR(_("Failed to re-attach PCI device: %s"),
- err ? err->message : "");
- virResetError(err);
+ if (pciDeviceGetManaged(dev)) {
+ while (pciWaitForDeviceCleanup(dev, "kvm_assigned_device")
+ && retries) {
+ usleep(100*1000);
+ retries--;
+ }
+ if (pciReAttachDevice(NULL, dev) < 0) {
+ virErrorPtr err = virGetLastError();
+ VIR_ERROR(_("Failed to re-attach PCI device: %s"),
+ err ? err->message : "");
+ virResetError(err);
+ }
}
}
diff --git a/src/util/pci.c b/src/util/pci.c
index 0defbfe..09535bd 100644
--- a/src/util/pci.c
+++ b/src/util/pci.c
@@ -883,6 +883,103 @@ pciReAttachDevice(virConnectPtr conn, pciDevice *dev)
return pciUnBindDeviceFromStub(conn, dev, driver);
}
+/* Certain hypervisors (like qemu/kvm) map the PCI bar(s) on
+ * the host when doing device passthrough. This can lead to a race
+ * condition where the hypervisor is still cleaning up the device while
+ * libvirt is trying to re-attach it to the host device driver. To avoid
+ * this situation, we look through /proc/iomem, and if the hypervisor is
+ * still holding onto the bar (denoted by the string in the matcher variable),
+ * then we can wait around a bit for that to clear up.
+ *
+ * A typical /proc/iomem looks like this (snipped for brevity):
+ * 00010000-0008efff : System RAM
+ * 0008f000-0008ffff : reserved
+ * ...
+ * 00100000-cc9fcfff : System RAM
+ * 00200000-00483d3b : Kernel code
+ * 00483d3c-005c88df : Kernel data
+ * cc9fd000-ccc71fff : ACPI Non-volatile Storage
+ * ...
+ * d0200000-d02fffff : PCI Bus #05
+ * d0200000-d021ffff : 0000:05:00.0
+ * d0200000-d021ffff : e1000e
+ * d0220000-d023ffff : 0000:05:00.0
+ * d0220000-d023ffff : e1000e
+ * ...
+ * f0000000-f0003fff : 0000:00:1b.0
+ * f0000000-f0003fff : kvm_assigned_device
+ *
+ * Returns 0 if we are clear to continue, and 1 if the hypervisor is still
+ * holding onto the resource.
+ */
+int
+pciWaitForDeviceCleanup(pciDevice *dev, const char *matcher)
+{
+ FILE *fp;
+ char line[160];
+ unsigned long long start, end;
+ int consumed;
+ char *rest;
+ unsigned long long domain;
+ int bus, slot, function;
+ int in_matching_device;
+ int ret;
+ size_t match_depth;
+
+ fp = fopen("/proc/iomem", "r");
+ if (!fp) {
+ /* If we failed to open iomem, we just basically ignore the error. The
+ * unbind might succeed anyway, and besides, it's very likely we have
+ * no way to report the error
+ */
+ VIR_DEBUG0("Failed to open /proc/iomem, trying to continue anyway");
+ return 0;
+ }
+
+ ret = 0;
+ in_matching_device = 0;
+ match_depth = 0;
+ while (fgets(line, sizeof(line), fp) != 0) {
+ /* the logic here is a bit confusing. For each line, we look to
+ * see if it matches the domain:bus:slot.function we were given.
+ * If this line matches the DBSF, then any subsequent lines indented
+ * by 2 spaces are the PCI regions for this device. It's also
+ * possible that none of the PCI regions are currently mapped, in
+ * which case we have no indented regions. This code handles all
+ * of these situations
+ */
+ if (in_matching_device && (strspn(line, " ") == (match_depth + 2))) {
+ if (sscanf(line, "%Lx-%Lx : %n", &start, &end, &consumed) != 2)
+ continue;
+
+ rest = line + consumed;
+ if (STRPREFIX(rest, matcher)) {
+ ret = 1;
+ break;
+ }
+ }
+ else {
+ in_matching_device = 0;
+ if (sscanf(line, "%Lx-%Lx : %n", &start, &end, &consumed) != 2)
+ continue;
+
+ rest = line + consumed;
+ if (sscanf(rest, "%Lx:%x:%x.%x", &domain, &bus, &slot, &function) != 4)
+ continue;
+
+ if (domain != dev->domain || bus != dev->bus || slot != dev->slot ||
+ function != dev->function)
+ continue;
+ in_matching_device = 1;
+ match_depth = strspn(line, " ");
+ }
+ }
+
+ fclose(fp);
+
+ return ret;
+}
+
static char *
pciReadDeviceID(pciDevice *dev, const char *id_name)
{
diff --git a/src/util/pci.h b/src/util/pci.h
index a1a19e7..e6ab137 100644
--- a/src/util/pci.h
+++ b/src/util/pci.h
@@ -81,5 +81,6 @@ int pciDeviceFileIterate(virConnectPtr conn,
int pciDeviceIsAssignable(virConnectPtr conn,
pciDevice *dev,
int strict_acs_check);
+int pciWaitForDeviceCleanup(pciDevice *dev, const char *matcher);
#endif /* __VIR_PCI_H__ */
--
1.6.6
14 years, 10 months
[libvirt] [PATCH] Add a rule to check for uses of readlink.
by Chris Lalancette
Signed-off-by: Chris Lalancette <clalance(a)redhat.com>
---
.x-sc_prohibit_readlink | 2 ++
cfg.mk | 5 +++++
2 files changed, 7 insertions(+), 0 deletions(-)
create mode 100644 .x-sc_prohibit_readlink
diff --git a/.x-sc_prohibit_readlink b/.x-sc_prohibit_readlink
new file mode 100644
index 0000000..e7acb03
--- /dev/null
+++ b/.x-sc_prohibit_readlink
@@ -0,0 +1,2 @@
+^src/util/util\.c$
+^ChangeLog-old$
diff --git a/cfg.mk b/cfg.mk
index 0f2d2a6..5013e4b 100644
--- a/cfg.mk
+++ b/cfg.mk
@@ -94,6 +94,11 @@ sc_prohibit_strncpy:
msg='use virStrncpy, not strncpy' \
$(_prohibit_regexp)
+sc_prohibit_readlink:
+ @re='readlink *\(' \
+ msg='use virFileResolveLink, not readlink' \
+ $(_prohibit_regexp)
+
sc_prohibit_gethostname:
@re='gethostname *\(' \
msg='use virGetHostname, not gethostname' \
--
1.6.6
14 years, 10 months
[libvirt] [PATCH] Use virFileResolveLink instead of readlink in AppArmor
by Chris Lalancette
Signed-off-by: Chris Lalancette <clalance(a)redhat.com>
---
src/security/security_apparmor.c | 17 +++++++++--------
1 files changed, 9 insertions(+), 8 deletions(-)
diff --git a/src/security/security_apparmor.c b/src/security/security_apparmor.c
index f288645..0857d58 100644
--- a/src/security/security_apparmor.c
+++ b/src/security/security_apparmor.c
@@ -258,22 +258,23 @@ get_profile_name(virConnectPtr conn, virDomainObjPtr vm)
static int
use_apparmor(void)
{
- char libvirt_daemon[PATH_MAX];
int rc = -1;
- ssize_t len = 0;
+ char *libvirt_daemon = NULL;
- if ((len = readlink("/proc/self/exe", libvirt_daemon,
- PATH_MAX - 1)) < 0) {
+ if (virFileResolveLink("/proc/self/exe", &libvirt_daemon) < 0) {
virSecurityReportError(NULL, VIR_ERR_INTERNAL_ERROR,
"%s", _("could not find libvirtd"));
- return rc;
+ return -1;
}
- libvirt_daemon[len] = '\0';
if (access(APPARMOR_PROFILES_PATH, R_OK) != 0)
- return rc;
+ goto cleanup;
- return profile_status(libvirt_daemon, 1);
+ rc = profile_status(libvirt_daemon, 1);
+
+cleanup:
+ VIR_FREE(libvirt_daemon);
+ return rc;
}
/* Called on libvirtd startup to see if AppArmor is available */
--
1.6.6
14 years, 10 months
Re: [libvirt] [PATCH] Fix a compile warning in parthelper.c
by Chris Lalancette
On 01/22/2010 04:07 AM, Daniel Veillard wrote:
> On Thu, Jan 21, 2010 at 11:33:18AM -0500, Chris Lalancette wrote:
>> Signed-off-by: Chris Lalancette <clalance(a)redhat.com>
>> ---
>> src/storage/parthelper.c | 1 +
>> 1 files changed, 1 insertions(+), 0 deletions(-)
>>
>> diff --git a/src/storage/parthelper.c b/src/storage/parthelper.c
>> index 5df46e8..ab04842 100644
>> --- a/src/storage/parthelper.c
>> +++ b/src/storage/parthelper.c
>> @@ -34,6 +34,7 @@
>>
>> #include <parted/parted.h>
>> #include <stdio.h>
>> +#include <string.h>
>>
>> /* we don't need to include the full internal.h just for this */
>> #define STREQ(a,b) (strcmp((a),(b)) == 0)
>
> ACK,
Thanks, pushed.
--
Chris Lalancette
14 years, 10 months
[libvirt] [PATCH] Fix device assignment with root devices
by Chris Lalancette
The patches to add ACS checking to PCI device passthrough
introduced a bug. With the current code, if you try to
passthrough a device on the root bus (i.e. bus 0), then
it denies the passthrough. This is because the code in
pciDeviceIsBehindSwitchLackingACS() to check for a parent
device doesn't take into account the possibility of the
root bus. If we are on the root bus, it means we
legitimately can't find a parent, and it also means that
we don't have to worry about whether ACS is enabled.
Therefore return 0 (indicating we don't lack ACS) from
pciDeviceIsBehindSwitchLackingACS().
Signed-off-by: Chris Lalancette <clalance(a)redhat.com>
---
src/util/pci.c | 19 ++++++++++++++-----
1 files changed, 14 insertions(+), 5 deletions(-)
diff --git a/src/util/pci.c b/src/util/pci.c
index 086b751..0defbfe 100644
--- a/src/util/pci.c
+++ b/src/util/pci.c
@@ -1198,11 +1198,20 @@ pciDeviceIsBehindSwitchLackingACS(virConnectPtr conn,
{
pciDevice *parent;
- if (!(parent = pciGetParentDevice(conn, dev))) {
- pciReportError(conn, VIR_ERR_NO_SUPPORT,
- _("Failed to find parent device for %s"),
- dev->name);
- return -1;
+ parent = pciGetParentDevice(conn, dev);
+ if (!parent) {
+ /* if we have no parent, and this is the root bus, ACS doesn't come
+ * into play since devices on the root bus can't P2P without going
+ * through the root IOMMU.
+ */
+ if (dev->bus == 0)
+ return 0;
+ else {
+ pciReportError(conn, VIR_ERR_NO_SUPPORT,
+ _("Failed to find parent device for %s"),
+ dev->name);
+ return -1;
+ }
}
/* XXX we should rather fail when we can't find device's parent and
--
1.6.6
14 years, 10 months
[libvirt] [PATCH] fix AppArmo build errors that crept in recently
by Jamie Strandboge
Not sure when these crept in, but here is a patch so that the build
won't fail when compiling with --enable-compile-warnings=error. Fixes
the following errors:
cc1: warnings being treated as errors
security/security_apparmor.c: In function 'AppArmorSetSecurityAllLabel':
security/security_apparmor.c:381: error: unused variable 'rc'
security/security_apparmor.c: In function
'AppArmorReleaseSecurityLabel':
security/security_apparmor.c:436: error: unused parameter 'conn'
--
Jamie Strandboge | http://www.canonical.com
14 years, 10 months
[libvirt] [PATCH] Remove unused PROC_MOUNT_BUF_LEN #define
by Chris Lalancette
Signed-off-by: Chris Lalancette <clalance(a)redhat.com>
---
src/qemu/qemu_conf.c | 2 --
1 files changed, 0 insertions(+), 2 deletions(-)
diff --git a/src/qemu/qemu_conf.c b/src/qemu/qemu_conf.c
index c843c6a..c227fe1 100644
--- a/src/qemu/qemu_conf.c
+++ b/src/qemu/qemu_conf.c
@@ -92,8 +92,6 @@ VIR_ENUM_IMPL(qemuVideo, VIR_DOMAIN_VIDEO_TYPE_LAST,
"", /* no arg needed for xen */
"" /* don't support vbox */);
-#define PROC_MOUNT_BUF_LEN 255
-
int qemudLoadDriverConfig(struct qemud_driver *driver,
const char *filename) {
virConfPtr conf;
--
1.6.6
14 years, 10 months
[libvirt] [PATCH 0/2] Make filesystems during fs pool build
by David Allan
This patch implements the formatting of block devices in the pool
build operation of filesystem pools. The existing implementation
would only make the directory on which to mount the filesystem, but
required the user to make the filesystem manually. My only concern
with this approach is that it changes the build behavior in a way
that's potentially destructive, so I wonder whether it would be worth
adding a flag to enable this behavior.
Dave
14 years, 10 months
[libvirt] [PATCH 0/2] MultiIQN support
by David Allan
This patch implements multiIQN support for iSCSI pools, allowing the user to specify the initiator IQN that should be used when creating the pool.
14 years, 10 months