
On 16/01/14 23:39, John Ferlan wrote:
On 01/16/2014 08:53 AM, Osier Yang wrote:
<...snip...>
It took some thinking, but I convinced myself that this path doesn't need the shared check since it's only called from qemuProcessReconnect; however, if something else did call it some day then that check may be necessary. It may be wise to add it anyway... I have no strong opinion about it being required for this change.
Missed reply for this one.
Actually it needs the checking. Assuming there are 2 live domains, and they are using the same SCSI generic device. It's possible since the device could be shared among domains. And in that case, we should update the "dev->used_by" array instead of adding the device to the list ("driver->activeScsiHostdevs") directly, during the reconnecting sequence. I.E it needs to restore the internal data correctly to the state just as the one before libvirtd getting shutdown.
The activeScsiHostdevs is listed off the driver and not the domain,
Yes. It's of the driver.
so it's not like each domain has to update which other domains is sharing.
But the list is filled while each domain is starting or *reconnecting*
Maybe I'm misreading your thoughts regarding needing to update instead of add.
If there were 2..n already running with the share attribute already set, would the reconnect threads run/finish before the new libvirtd accepts new connections/domain starts?
Yes. See daemonRunStateInit.
If so, then this code will restore life as it was before libvirtd was stopped.
Yes. That's what this patch tries to do.
The only way those domains would have been able to start previously and share the device is if they ran through the qemuPrepareHostdevSCSIDevices() which does check shareable.
Yes, that's why I didn't do the checking about "shareable" in qemuUpdateActivePciHostdevs.
My assumption was after some amount of code reading - those restart threads would finish so that it's not possible for some domain to be started that isn't sharing, but wanting to use the same device.
Exactly right.
Since prior code doesn't allow sharing of these devices there doesn't need to be a consideration made for an already running domain when this code first appears. Although, you may run into situations where qemuPrepareHostdevSCSIDevices() fails due to a running domain that was running before the shareable attribute was known.
Indeed, but we have no way to avoid this problem for the domains which was running against the old libvirt, and the libvirt is upgraded after that.
That domain would have to be restarted - something that could be documentable. The error message could be more helpful indicating it's in use without the shareable attribute by some other domain.
It's more complicate than this, when a domain is trying to start, the situations could be (with new code): 1) No domain is using the device 2) 1 domain is using the device 3) More than 1 domain are using the device. For 1), it's just fine, let's ignore it. For 2), the domain which is using it could use the device as either "shareable" or "non-shareable". If it's "shareable", the later domain could be started successfully only if it uses the "device" as "shareable" too; If it's "non-shareable", the later domain can't use the device anyway. For 3), the domains must use the device as "shareable". And the later domain must use the device as "shareable" too to be started successfully. So, as you seen, we can't simply say with error "the device is in use without shareable attribute by some other domain". We can use if...else branches to construct the more sensible errors, but I'm doubting about if we should make things more complicate. Though for old code, simply saying "the device is in use without the shareable attribute by some other domain" is fine, but if you think about the situations in new code, it will be a mess.
In that case, only 1 domain would be using it thus in theory used_by[0] would be that domain. If there were more than 1 domain (n_used_by), then you've got other problems in the code!
I guess here you still mean the domain(s) which were running with the old code. And since with the old code, there could be only 1 domain using the same device, regardless of the device is specified as "shareable" or not. So yet, the "n_used_by" must be 1. As a conclusion, I think the only concern from you is about the problem on the running domain of an old libvirt (without these 2 patches). Right? If so, my thought is to add document somewhere, though I have not much idea about where to put the document, and how to write the document to explain the problem which is such complicated. Osier