On 2015/4/23 17:46, Michal Privoznik wrote:
On 23.04.2015 11:32, Peter Krempa wrote:
> On Thu, Apr 23, 2015 at 11:19:34 +0200, Michal Privoznik wrote:
>> On 23.04.2015 10:30, Peter Krempa wrote:
>>> On Thu, Apr 23, 2015 at 11:44:49 +0800, zhang bo wrote:
>>>> The function qemuBuildCommandLine() may take a long time, for example
>>>> if we configure tens of vifs for the guest, each may cost hundrands of
>>>> milliseconds to create tap dev, senconds in total. Thus, unlock vm
>>>> before calling it.
>>
>
>>>> Signed-off-by:
Zhang Bo <oscar.zhangbo(a)huawei.com>
>>>> Signed-off-by: Zhou Yimin <zhouyimin(a)huawei.com>
>>>> ---
>>>> src/qemu/qemu_process.c | 6 +++++-
>>>> 1 file changed, 5 insertions(+), 1 deletion(-)
>>
>
>>>> diff --git
a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
>>>> index 753afe8..d1aaaec 100644
>>>> --- a/src/qemu/qemu_process.c
>>>> +++ b/src/qemu/qemu_process.c
>>>> @@ -4628,14 +4628,18 @@ int qemuProcessStart(virConnectPtr conn,
>>>> }
>>
>
>>>>
VIR_DEBUG("Building emulator command line");
>>>> + virObjectUnlock(vm);
>>>> if (!(cmd = qemuBuildCommandLine(conn, driver, vm->def,
priv->monConfig,
>>>> priv->monJSON,
priv->qemuCaps,
>>>> migrateFrom, stdin_fd, snapshot,
vmop,
>>>> &buildCommandLineCallbacks,
false,
>>>> qemuCheckFips(),
>>>> priv->autoNodeset,
>>>> - &nnicindexes,
&nicindexes)))
>>>> + &nnicindexes,
&nicindexes))) {
>>>> + virObjectLock(vm);
>>>> goto cleanup;
>>>> + }
>>>> + virObjectLock(vm);
>
>
>>> Why do you need to unlock the object? The VM
is starting at this point
>>> so you won't be able to save any time since APIs will either be blocked
>>> by a job or by the fact that the VM was not started.
>
>> Not true. We still allow QUERY jobs, or APIs that
lock a domain object
>> but don't necessarily take a job. For instance, if in one thread I'm
>> doing virDomainGetOSType(), virDomainGetInfo(), ... (essentially 'virsh
>> dominfo') while the other just starts the domain. I don't see a reason
>> why virDomainGetOSType() should wait for the domain to be started up.
>> Domain state should have no affect on the guest OS type, should it?
>
> OK that is technically right. I wanted to point out that most of the
> stats are available only for online machines or it shouldn't be much of
> a problem to dealy the delivery.
>
> Your example certainly doesn't illustrate why the delay to format the
> command line should be a problem when using libvirt. Or are you polling
> virDomainGetOSType every millisecond?
>
> I am curious to see why the delay would be a problem.
Yep, I'm too. So far we don't really care much about libvirt response
time (otherwise our critical sections would be much shorter), but maybe
it's an issue for somebody.
The specific semantic is:
migrating multiple guests simultaneously, downtime of each guest will add up,
even to an unacceptable value.
1 suppose the downtime is 2 seconds when we migrate only 1 guest at one time.
2 suppose it costs 0.5sec to create each tapDev, and each guest has 20 vifs,
that's 10 seconds for qemuBuildCommandLine.
3 now, we migrate 10 guest simultaneously, the downtime of the guests will vary from
seconds to even 100 seconds.
The reason for the problem is that:
1 guestA locks vm while creating each tapDev(virNetDevTapCreate) in
qemuBuildCommandLine(), for 10seconds
2 guestB calls qemuMigrationPrepareAny->*virDomainObjListAdd* to get its vm object,
which locks 'doms'
and waits for the vm lock.
3 doms will be locked until guestA unlock its vm, we say that's 10 seconds.
4 guestC calls qemuDomainMigrateFinish3->virDomainObjListFindByName, which tries to
lock doms. because it's
now locked by guestB, guestC blocks here, and it can't be unpaused for at least 10
seconds.
5 then comes to guestD, guestE, guestF, etc, the downtime will be added up, to even 50
seconds or more.
6 the command 'virsh list' is blocked as well.
Thus, we think the problem must be solved.
>
>
>
>>> On the other hand, I don't think we can
just lock and unlock the domain
>>> object as we please. qemuBuildCommandLine is a very complex function and
>>> as such it calls many others. Some of them may rely on the fact, that
>>> the object is locked by caller.
>
>> Well, you definitely can't since almost
everything in there is accessing
>> vm->def. Locking semantics would be broken right in the line after @vm
>> was unlocked by dereferencing it.
>
> Well, anything that would change a domain definition should grap a
> MODIFY job. But such jobs are serialized, so even if we unlock the
> domain we should be okay to still access vm->def. What I am more worried
> about are all the small functions that interact with system or
> something. Currently they are serialized by @vm lock, but one we remove
> it ...
>
> Michal
>
> .
>
After the discussion above, maybe it's better to move virNetDevTapCreate() prior to
qemuBuildCommandLine(), what do
you think about that? If that's ok, I'd like to apply patchV2 here.