On Thu, Apr 16, 2015 at 02:28:52PM +0800, zhang bo wrote:
On 2015/4/10 15:54, Jiri Denemark wrote:
> On Wed, Apr 08, 2015 at 15:40:36 +0800, zhang bo wrote:
>> We recently encountered a problem:
>> 1) migrate a domain
>> 2) the client unexpectedly got *crashed* (let's take it as virsh command)
>> 3) *libvirtd still kept migrating the domain*
>> 4) after it's restarted, the client didn't know the guest is still
migrating.
>>
>> The problem is that libvirtd and the client has different view of the task state.
After migration,
>> the client may wrongly think that something's wrong that the domain got
unexpectedly migrated.
>>
>> In my opinion, libvirtd should just *execute* tasks, like the hands of a human,
>> while clients should be the brain to *schedule and remember* tasks.
>>
>> So, In order to avoid this problem,we should let the client record all the taskes
somewhere,
>> and reload the states after its restart. the client may cancel or continue the
task as it wishes.
>> Libvirtd should not record the task status.
>
> Not really. It's libvirtd, the daemon, which has to remember everything.
> It manages the state of all domains running on a host and synchronizes
> all clients that want to change state of the domains. Remember, even if
> a client is not restarted, domains my unexpectedly migrate somewhere
> else because another client might have asked for it.
>
> That said, if you're implementing a higher management layer which
> manages domains using libvirt and you know it is going to be the only
> client talking directly to libvirt, you can remember the state there if
> you want. However, it's not something libvirt itself should or could do.
> But you will most likely need to synchronize the state with libvirtd in
> case the client is restarted. Even libvirtd has to synchronize its
> internal state with all running QEMU processes when it is restarted
> because the state might have changed.
>
> Jirka
>
> .
>
Thank you Jirka.
Let's go a step further, suppose that the client doesn't crash at step 2),
it's just disconnected to libvirtd at src side.
1) client(nova) calls virDomainMigrateToURI2() to migrate a guest
2) libvirtd at src side connects to libvirtd at dest side.
3) Unfortunately, somehow, client(nova) gets disconnected to libvirtd while migrating
the guest.
4) the API virDomainMigrateToURI2() returns with error in client(nova)
5) but libvirtd doesn't aware that the connection to client is broken, and keeps
migrating the guest to dest.
libvirtd is aware of that, but that doesn't mean it should stop the
migration, if the task virDomainMigrateToURI2() got through the wire,
it started migrating.
6) the guest is migrated to the dest side eventually.
7) Because the nova at src side thinks migration is not successed as step 4), the nova
at the dest will consider the migrated-in guest as an unexpected running guest, and will
shut it down.
nova knows the exact error that occurred and thus it can differentiate
between "error: Cannot migrate because 'asdf'" and "error:
XML-RPC:
connection broken" or whatever. If the connection was broken nova
must get all new info (refresh its knowledge state) from libvirt upon
new connection.
The guest disappears at last, due to the previous disconnection of
libvirtd client and server.
Even though libvirtd remembers everything, the client at dest side still wrongly killed
the guest after migration.
So, how to solve this problem? Shall libvirtd keep watching its clients' connection,
and cancel running jobs concerning the disconnected client immediately after the client
disconnects?
--
libvir-list mailing list
libvir-list(a)redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list