On 07/11/2017 10:38 AM, Erik Skultety wrote:
On Mon, Jul 03, 2017 at 05:25:30PM -0400, John Ferlan wrote:
> Since virnodedeviceobj now has a self-lockable hash table, there's no
> need to lock the table from the driver for processing. Thus remove the
> locks from the driver for NodeDeviceObjList mgmt.
>
> Signed-off-by: John Ferlan <jferlan(a)redhat.com>
[..]
> @@ -601,8 +565,6 @@ nodeDeviceDestroy(virNodeDevicePtr device)
> return -1;
> def = virNodeDeviceObjGetDef(obj);
>
> - nodeDeviceLock();
> -
Consider the following scenario handling the same device:
What if I told you that's impossible? You cannot "have" a scsi_hostN,
delete a scsi_hostN, and then have a new one created with the same name.
The scsi_hostN's are an ever increasing value of N. Once created and
deleted, the N will not be reused.
Thread 1 (udev event handler callback) | Thread 2
(nodeDeviceDestroy)
=======================================|=============================
| # attempt to destroy a device
| obj = nodeDeviceObjFindByName()
|
| # @obj is locked
| def = virNodeDeviceObjGetDef(obj)
|
| virNodeDeviceObjEndAPI(&obj)
| # @obj is unlocked
<------
# change event |
udevAddOneDevice() |
|
obj = virNodeDeviceObjListFindByName |
# @obj locked |
new_device = false |
# @obj unlocked |
|
obj = virNodeDeviceObjListAssignDef |
# @obj locked |
virNodeDeviceObjEndAPI(&obj) |
# @obj unlocked and @def changed |
------>
| virNodeDeviceObjListGetParentHost()
| if (def->parent) ===> SIGSEGV
In patch 12/14 this would have been a deadlock because you first locked the
@obj, then nodedev driver while udevAddOneDevice did in the reverse order. I
don't know about NPIV so I'm not sure whether there is another way of removing
a device and updating the parent device tree, might require an updated model,
but for now, you need to make a deep copy of @def. I can see that the chance of
getting an 'change' event from udev on a vHBA device is low, but I'm trying
to
look at it from a higher perspective, as we want to be able to remove mdevs
this way, possibly other devices in the future.
I think what happens is code from virNodeDeviceGetWWNs through
virVHBAManageVport gets placed into a function that handles vHBA's on
deletion. Similarly for CreateXML, vHBA specific code ends up in a
helper function. Those helpers would be called based on the type of
object/device we're talking about (vHBA/mdev).
BTW: I recall doing a review suggesting perhaps creating an mdev pool
driver of sorts. Daniel essentially responded that using the node device
driver and augmenting it to fit the needs would be OK. At the time, I
wasn't specifically thinking about this case, but it certainly qualifies
as something of concern where the existing node device code can make
some assumptions about names that the underlying udev code provides.
You may need to add some extra layer of protection if names can be
reused especially because of this event mgmt "problem"... You may also
need to modify that "if (dev)" code to check if the dev is an mdev and
if so, do something different.
Looks like the code was added by commit id '546fa3ef'. Perhaps as a way
to "follow" how other drivers did things.
"Fortunately" the only consumer of CreateXML/Destroy ends up being vHBA
and we know there's rules around the naming. IIRC - you were letting
MDEV also set the name, right? That is, a 'name' on input is essentially
and happily ignored. So what creates that name? And can you be assured
it's going to be unique for the life/run time of the host? If so,
there's no way a CreateXML could reuse a name that AddOneDevice would be
using, right?
Maybe I need to think some more - it's been a long day already
John
The rest of the patch looks okay, but I want to try it first.
Erik