On 7/18/19 11:28 AM, Daniel P. Berrangé wrote:
On Thu, Jul 18, 2019 at 11:14:49AM +0200, Michal Privoznik wrote:
> If there are two paths on the list that are the same we need to
> lock it only once. Because when we try to lock it the second time
> then open() fails. And if it didn't, locking it the second time
> would fail for sure. After all, it is sufficient to lock all
> paths just once satisfy the caller.
>
> Reported-by: Daniel Henrique Barboza <danielhb413(a)gmail.com>
> Signed-off-by: Michal Privoznik <mprivozn(a)redhat.com>
> ---
> src/security/security_manager.c | 23 +++++++++++++++++++++--
> 1 file changed, 21 insertions(+), 2 deletions(-)
Reviewed-by: Daniel P. Berrangé <berrange(a)redhat.com>
>
> diff --git a/src/security/security_manager.c b/src/security/security_manager.c
> index ade2c96141..7c905f0785 100644
> --- a/src/security/security_manager.c
> +++ b/src/security/security_manager.c
> @@ -1294,16 +1294,35 @@ virSecurityManagerMetadataLock(virSecurityManagerPtr mgr
ATTRIBUTE_UNUSED,
> * paths A B and there's another that is trying to lock them
> * in reversed order a deadlock might occur. But if we sort
> * the paths alphabetically then both processes will try lock
> - * paths in the same order and thus no deadlock can occur. */
> + * paths in the same order and thus no deadlock can occur.
> + * Lastly, it makes searching for duplicate paths below
> + * simpler. */
> qsort(paths, npaths, sizeof(*paths), cmpstringp);
>
> for (i = 0; i < npaths; i++) {
> const char *p = paths[i];
> struct stat sb;
> + size_t j;
> int retries = 10 * 1000;
> int fd;
>
> - if (!p || stat(p, &sb) < 0)
> + if (!p)
> + continue;
> +
> + /* If there's a duplicate path on the list, skip it over.
> + * Not only we would fail open()-ing it the second time,
> + * we would deadlock with ourselves trying to lock it the
> + * second time. After all, we've locked it when iterating
> + * over it the first time. */
Presumably this deals with the problem at cold boot. Is it possible
to hit the same problem when we have cold plugged one device and
then later try to hotplug another device using the same resource,
or do the SRIOV assignment grouping requirements make the hotplug
impossible ?
Okay, so digging deeper I think I know what the problem is. In general,
it is of course possible to open a file multiple times, but that is not
true for "/dev/vfio/N" which can be opened exactly one time. Indeed,
looking into vfio_group_fops_open() in kernel.git/drivers/vfio/vfio.c if
a group is already opened, then -EBUSY is returned (what we've seen in
the error message). In fact, git log blames v3.11-rc1~46^2~1 which
explicitly disallowed multiple open()-s.
So, in theory, our code works, because we open() the files we are
chown()-ing and close them immediatelly after. So the problem in which
we try to lock /dev/vfio/N multiple times won't happen. However, since
qemu already has the device opened, we will fail opening it.
I can see two options here:
1) ignore this specific error in virSecurityManagerMetadataLock(). This
will of course mean that the caller (virSecurityDACTransactionRun())
will chown() the /dev/vfio/N file without a lock, but then again, we
have namespaces and there can't be any other domain using the path. And
also, we don't really chown() if the file already has the correct owner
- which is going to be 99.99% cases unless there would be different
seclabels for two PCI devices (do we even allow <seclabel/> for
<hostdev/>?).
2) Make virSecurityDACSetHostdevLabel() and
virSecurityDACRestoreHostdevLabel() be NO-OP if a domain already/still
has a device from the same IOMMU group like the one we're
attaching/detaching. If these are NO-OPs then no relabel is done and
thus the whole code I'm modifying in 1) is not even called.
I have a preference for 1) because the code will be cleaner IMO.
Michal