On 07/14/2017 11:37 AM, John Ferlan wrote:
On 07/14/2017 05:23 AM, Erik Skultety wrote:
> On Thu, Jul 13, 2017 at 07:23:58PM -0400, John Ferlan wrote:
>>
>>
>> On 07/11/2017 10:38 AM, Erik Skultety wrote:
>>> On Mon, Jul 03, 2017 at 05:25:30PM -0400, John Ferlan wrote:
>>>> Since virnodedeviceobj now has a self-lockable hash table, there's
no
>>>> need to lock the table from the driver for processing. Thus remove the
>>>> locks from the driver for NodeDeviceObjList mgmt.
>>>>
>>>> Signed-off-by: John Ferlan <jferlan(a)redhat.com>
>>>
>>> [..]
>>>
>>>> @@ -601,8 +565,6 @@ nodeDeviceDestroy(virNodeDevicePtr device)
>>>> return -1;
>>>> def = virNodeDeviceObjGetDef(obj);
>>>>
>>>> - nodeDeviceLock();
>>>> -
>>>
>>> Consider the following scenario handling the same device:
>>>
>>
>> What if I told you that's impossible? You cannot "have" a
scsi_hostN,
>> delete a scsi_hostN, and then have a new one created with the same name.
>
> Except I wasn't considering creation, rather than plain change. Although I
> didn't manage to find out under what circumstances would kernel actually emit
> the 'change' event (I tried to write to various writable attributes - but
since
> it either lacks documentation completely or it's just well hidden from me - I
> wasn't able to trigger it), other than explicitly triggering it by writing to
> sysfs' uevent - so in that aspect, as I wrote below in my previous response,
> it's highly unlikely but not impossible to hit the race.
>
Not quite sure what would trigger a change event. A vHBA has a wwnn/wwpn
and a parent wwnn/wwpn and WWN. If any of those change, the vHBA would
need to be deleted and recreated.
I do have a faint recollection of considering the ramifications of
dropping the obj lock in that path and the race drawback, but I
dismissed it mainly because of "how" vHBA's are created and what could
constitute a change event for a vHBA essentially redefines it.
>>
>> The scsi_hostN's are an ever increasing value of N. Once created and
>> deleted, the N will not be reused.
>>
>>> Thread 1 (udev event handler callback) | Thread 2 (nodeDeviceDestroy)
>>> =======================================|=============================
>>> | # attempt to destroy a device
>>> | obj = nodeDeviceObjFindByName()
>>> |
>>> | # @obj is locked
>>> | def = virNodeDeviceObjGetDef(obj)
>>> |
>>> | virNodeDeviceObjEndAPI(&obj)
>>> | # @obj is unlocked
>>> <------
>>> # change event |
>>> udevAddOneDevice() |
>>> |
>>> obj = virNodeDeviceObjListFindByName |
>>> # @obj locked |
>>> new_device = false |
>>> # @obj unlocked |
>>> |
>>> obj = virNodeDeviceObjListAssignDef |
>>> # @obj locked |
>>> virNodeDeviceObjEndAPI(&obj) |
>>> # @obj unlocked and @def changed |
>>> ------>
>>> | virNodeDeviceObjListGetParentHost()
>>> | if (def->parent) ===>
SIGSEGV
>>>
>>> In patch 12/14 this would have been a deadlock because you first locked the
>>> @obj, then nodedev driver while udevAddOneDevice did in the reverse order. I
>>> don't know about NPIV so I'm not sure whether there is another way of
removing
>>> a device and updating the parent device tree, might require an updated
model,
>>> but for now, you need to make a deep copy of @def. I can see that the chance
of
>>> getting an 'change' event from udev on a vHBA device is low, but
I'm trying to
>>> look at it from a higher perspective, as we want to be able to remove mdevs
>>> this way, possibly other devices in the future.
>>
>> I think what happens is code from virNodeDeviceGetWWNs through
>> virVHBAManageVport gets placed into a function that handles vHBA's on
>> deletion. Similarly for CreateXML, vHBA specific code ends up in a
>> helper function. Those helpers would be called based on the type of
>> object/device we're talking about (vHBA/mdev).
>
> Despite the likelihood of the case I'm describing, the main point I'm trying
to
> make is that the lock protects a mutable resource (@def) and by releasing it
> followed by querying it without actually holding the lock violates thread
> safety.
>
I understand the position and while the likelihood is essentially next
to zero that something like that could happen it's also possible to
remove @def and pass copies of the name, parent, parent_wwnn,
parent_wwpn, and parent_fabric_wwn.
So before I do that - can we close on patches 1-12?
Let me amend this statement to include - modulo passing @def to
virNodeDeviceObjNew in patch 5 as Michal has requested in his review of
Secrets and NWFilter that @def is not passed to the vir*ObjNew.
I will also post a reversal for Interfaces to keep everything the same...
John
I can then post a v5 that alters the
virNodeDeviceObjListGetParentHost
to take parameters instead of @def. That should allay this concern and
make patches 13/14 be 2 and 3 of the next series.
John
>>
>> BTW: I recall doing a review suggesting perhaps creating an mdev pool
>> driver of sorts. Daniel essentially responded that using the node device
>> driver and augmenting it to fit the needs would be OK. At the time, I
>> wasn't specifically thinking about this case, but it certainly qualifies
>> as something of concern where the existing node device code can make
>> some assumptions about names that the underlying udev code provides.
>>
>> You may need to add some extra layer of protection if names can be
>> reused especially because of this event mgmt "problem"... You may also
>> need to modify that "if (dev)" code to check if the dev is an mdev and
>> if so, do something different.
>>
>> Looks like the code was added by commit id '546fa3ef'. Perhaps as a way
>> to "follow" how other drivers did things.
>>
>> "Fortunately" the only consumer of CreateXML/Destroy ends up being
vHBA
>> and we know there's rules around the naming. IIRC - you were letting
>> MDEV also set the name, right? That is, a 'name' on input is essentially
>
> No, mdev's name is a readonly attribute that is optional and exposed by the
> vendor, the only thing libvirt can currently write to the mdev's sysfs
> interface is UUID.
>
> Erik
>
>> and happily ignored. So what creates that name? And can you be assured
>> it's going to be unique for the life/run time of the host? If so,
>> there's no way a CreateXML could reuse a name that AddOneDevice would be
>> using, right?
>>
>> Maybe I need to think some more - it's been a long day already
>>
>> John
>>
>>>
>>> The rest of the patch looks okay, but I want to try it first.
>>>
>>> Erik
>>>
--
libvir-list mailing list
libvir-list(a)redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list