
On Fri, Jul 14, 2017 at 11:37:36AM -0400, John Ferlan wrote:
On 07/14/2017 05:23 AM, Erik Skultety wrote:
On Thu, Jul 13, 2017 at 07:23:58PM -0400, John Ferlan wrote:
On 07/11/2017 10:38 AM, Erik Skultety wrote:
On Mon, Jul 03, 2017 at 05:25:30PM -0400, John Ferlan wrote:
Since virnodedeviceobj now has a self-lockable hash table, there's no need to lock the table from the driver for processing. Thus remove the locks from the driver for NodeDeviceObjList mgmt.
Signed-off-by: John Ferlan <jferlan@redhat.com>
[..]
@@ -601,8 +565,6 @@ nodeDeviceDestroy(virNodeDevicePtr device) return -1; def = virNodeDeviceObjGetDef(obj);
- nodeDeviceLock(); -
Consider the following scenario handling the same device:
What if I told you that's impossible? You cannot "have" a scsi_hostN, delete a scsi_hostN, and then have a new one created with the same name.
Except I wasn't considering creation, rather than plain change. Although I didn't manage to find out under what circumstances would kernel actually emit the 'change' event (I tried to write to various writable attributes - but since it either lacks documentation completely or it's just well hidden from me - I wasn't able to trigger it), other than explicitly triggering it by writing to sysfs' uevent - so in that aspect, as I wrote below in my previous response, it's highly unlikely but not impossible to hit the race.
Not quite sure what would trigger a change event. A vHBA has a wwnn/wwpn and a parent wwnn/wwpn and WWN. If any of those change, the vHBA would need to be deleted and recreated.
I do have a faint recollection of considering the ramifications of dropping the obj lock in that path and the race drawback, but I dismissed it mainly because of "how" vHBA's are created and what could constitute a change event for a vHBA essentially redefines it.
The scsi_hostN's are an ever increasing value of N. Once created and deleted, the N will not be reused.
Thread 1 (udev event handler callback) | Thread 2 (nodeDeviceDestroy) =======================================|============================= | # attempt to destroy a device | obj = nodeDeviceObjFindByName() | | # @obj is locked | def = virNodeDeviceObjGetDef(obj) | | virNodeDeviceObjEndAPI(&obj) | # @obj is unlocked <------ # change event | udevAddOneDevice() | | obj = virNodeDeviceObjListFindByName | # @obj locked | new_device = false | # @obj unlocked | | obj = virNodeDeviceObjListAssignDef | # @obj locked | virNodeDeviceObjEndAPI(&obj) | # @obj unlocked and @def changed | ------> | virNodeDeviceObjListGetParentHost() | if (def->parent) ===> SIGSEGV
In patch 12/14 this would have been a deadlock because you first locked the @obj, then nodedev driver while udevAddOneDevice did in the reverse order. I don't know about NPIV so I'm not sure whether there is another way of removing a device and updating the parent device tree, might require an updated model, but for now, you need to make a deep copy of @def. I can see that the chance of getting an 'change' event from udev on a vHBA device is low, but I'm trying to look at it from a higher perspective, as we want to be able to remove mdevs this way, possibly other devices in the future.
I think what happens is code from virNodeDeviceGetWWNs through virVHBAManageVport gets placed into a function that handles vHBA's on deletion. Similarly for CreateXML, vHBA specific code ends up in a helper function. Those helpers would be called based on the type of object/device we're talking about (vHBA/mdev).
Despite the likelihood of the case I'm describing, the main point I'm trying to make is that the lock protects a mutable resource (@def) and by releasing it followed by querying it without actually holding the lock violates thread safety.
I understand the position and while the likelihood is essentially next to zero that something like that could happen it's also possible to remove @def and pass copies of the name, parent, parent_wwnn, parent_wwpn, and parent_fabric_wwn.
So before I do that - can we close on patches 1-12?
Yeah, I wasn't explicit with the ACK for 1-12 because I first wanted to have a small discussion about the changes, but we can now focus solely on the rest. So, ACK to 1-12 with the proposed change to virNode*ObjNew not consuming @def (I don't have a strong preference here that I could back with some reasonable pros/cons, to me, having it behave like a constructor for OOP, which we're essentially trying to copy in libvirt, naturally makes sense, but I also see Michal's point). Erik