On 16/01/14 23:39, John Ferlan wrote:
On 01/16/2014 08:53 AM, Osier Yang wrote:
<...snip...>
>> It took some thinking, but I convinced myself that this path doesn't
>> need the shared check since it's only called from qemuProcessReconnect;
>> however, if something else did call it some day then that check may be
>> necessary. It may be wise to add it anyway... I have no strong opinion
>> about it being required for this change.
>>
> Missed reply for this one.
>
> Actually it needs the checking. Assuming there are 2 live domains, and
> they are using the same SCSI generic device. It's possible since the
> device could be shared among domains. And in that case, we should
> update the "dev->used_by" array instead of adding the device to the
list
> ("driver->activeScsiHostdevs") directly, during the reconnecting
sequence.
> I.E it needs to restore the internal data correctly to the state just as the
> one before libvirtd getting shutdown.
>
The activeScsiHostdevs is listed off the driver and not the domain,
Yes. It's of the driver.
so
it's not like each domain has to update which other domains is sharing.
But the list is filled while each domain is starting or *reconnecting*
Maybe I'm misreading your thoughts regarding needing to update
instead
of add.
If there were 2..n already running with the share attribute already set,
would the reconnect threads run/finish before the new libvirtd accepts
new connections/domain starts?
Yes. See daemonRunStateInit.
If so, then this code will restore life
as it was before libvirtd was stopped.
Yes. That's what this patch tries to do.
The only way those domains would
have been able to start previously and share the device is if they ran
through the qemuPrepareHostdevSCSIDevices() which does check shareable.
Yes, that's why I didn't do the checking about "shareable" in
qemuUpdateActivePciHostdevs.
My assumption was after some amount of code reading - those restart
threads would finish so that it's not possible for some domain to be
started that isn't sharing, but wanting to use the same device.
Exactly right.
Since prior code doesn't allow sharing of these devices there doesn't
need to be a consideration made for an already running domain when this
code first appears. Although, you may run into situations where
qemuPrepareHostdevSCSIDevices() fails due to a running domain that was
running before the shareable attribute was known.
Indeed, but we have no way to avoid this problem for the domains
which was running against the old libvirt, and the libvirt is upgraded
after that.
That domain would have
to be restarted - something that could be documentable. The error
message could be more helpful indicating it's in use without the
shareable attribute by some other domain.
It's more complicate than this, when a domain is trying to start,
the situations could be (with new code):
1) No domain is using the device
2) 1 domain is using the device
3) More than 1 domain are using the device.
For 1), it's just fine, let's ignore it.
For 2), the domain which is using it could use the device as either
"shareable" or "non-shareable". If it's "shareable",
the later domain
could be started successfully only if it uses the "device" as
"shareable"
too; If it's "non-shareable", the later domain can't use the device
anyway.
For 3), the domains must use the device as "shareable". And the later
domain must use the device as "shareable" too to be started successfully.
So, as you seen, we can't simply say with error "the device is in use
without shareable attribute by some other domain". We can use if...else
branches to construct the more sensible errors, but I'm doubting about
if we should make things more complicate.
Though for old code, simply saying "the device is in use without
the shareable attribute by some other domain" is fine, but if you think
about the situations in new code, it will be a mess.
In that case, only 1 domain
would be using it thus in theory used_by[0] would be that domain. If
there were more than 1 domain (n_used_by), then you've got other
problems in the code!
I guess here you still mean the domain(s) which were running with
the old code. And since with the old code, there could be only 1
domain using the same device, regardless of the device is specified
as "shareable" or not. So yet, the "n_used_by" must be 1.
As a conclusion, I think the only concern from you is about the problem
on the running domain of an old libvirt (without these 2 patches). Right?
If so, my thought is to add document somewhere, though I have not
much idea about where to put the document, and how to write the
document to explain the problem which is such complicated.
Osier