On 01 Oct 2014, at 12:07 , Jiri Denemark <jdenemar(a)redhat.com> wrote:
On Wed, Oct 01, 2014 at 10:45:33 +0200, Cristian KLEIN wrote:
> On 2014-09-30 17:16, Daniel P. Berrange wrote:
>> On Tue, Sep 30, 2014 at 05:11:03PM +0200, Jiri Denemark wrote:
>>> On Tue, Sep 30, 2014 at 16:39:22 +0200, Cristian Klein wrote:
>>>> Signed-off-by: Cristian Klein <cristian.klein(a)cs.umu.se>
>>>> ---
>>>> include/libvirt/libvirt.h.in | 1 +
>>>> src/libvirt.c | 7 +++++++
>>>> 2 files changed, 8 insertions(+)
>>>>
>>>> diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in
>>>> index 5217ab3..82f3aeb 100644
>>>> --- a/include/libvirt/libvirt.h.in
>>>> +++ b/include/libvirt/libvirt.h.in
>>>> @@ -1225,6 +1225,7 @@ typedef enum {
>>>> VIR_MIGRATE_ABORT_ON_ERROR = (1 << 12), /* abort migration
on I/O errors happened during migration */
>>>> VIR_MIGRATE_AUTO_CONVERGE = (1 << 13), /* force
convergence */
>>>> VIR_MIGRATE_RDMA_PIN_ALL = (1 << 14), /* RDMA memory
pinning */
>>>> + VIR_MIGRATE_POSTCOPY = (1 << 15), /* enable (but
don't start) post-copy */
>>>> } virDomainMigrateFlags;
>>>
>>> I still think we should add an extra flag to start post copy
>>> immediately. To address your concerns about it, I don't think it's
>>> implementing a policy in libvirt. It's for apps that want to make sure
>>> migration converges without having to spawn another thread and monitor
>>> the progress or wait for a timeout. It's a bit similar to migrating a
>>> paused domain vs. migrating a running domain and pausing it when it
>>> doesn't seem to converge.
>>
>> Your point about spawning another thread makes me wonder if we should
>> actually look at adding a 'VIR_MIGRATE_ASYNC' method (that would require
>> P2P migration of course). If this flag were set, virDomainMigrateXXX would
>> only block for long enough to start the migration and then return.
>>
>> Callers can use the job info API to monitor progress & success/failure.
>>
>> Then we wouldn't have to keep adding flags like you suggest - apps can
>> just easily call the appropriate API right away with no threads needed
>
> This would make a lot of sense. The user would call:
>
> """
> virDomainMigrateXXX(..., VIR_MIGRATE_POSTCOPY | VIR_MIGRATE_ASYNC)
> virDomainMigrateStartPostCopy(...)
> """
>
> Would this be seen as more cumbersome than having a dedicated
> VIR_MIGRATE_POSTCOPY_AUTOSTART?
The ASYNC flag Daniel suggested makes sense, so I guess you can just
ignore my request for a special flag. Although, I don't think the ASYNC
stuff needs to be done within this series, let's just focus on the
post-copy stuff.
Hi Jirka,
I talked to the qemu post-copy guys (Andrea and Dave in CC). Starting post-copy
immediately is a bad performance choice: The VM will start on the destination hypervisor
before the read-only or kernel memory is there. This means that those pages need to be
pulled on-demand, hence a lot of overhead and interruptions in the VM’s execution.
Instead, it is better to first do one pass of pre-copy and only then trigger post-copy. In
fact, I did an experiment with a video streaming VM and starting post-copy after the first
pass of pre-copy (instead of starting post-copy immediately) reduces downtime from 3.5
seconds to under 1 second.
Given all above, I propose the following post-copy API in libvirt:
virDomainMigrateXXX(..., VIR_MIGRATE_ENABLE_POSTCOPY)
virDomainMigrateStartPostCopy(...) // from a different thread
This is for those who just need the post-copy mechanism and want to implement a policy
themselves.
virDomainMigrateXXX(..., VIR_MIGRATE_POSTCOPY_AFTER_PRECOPY)
This is for those who want to use post-copy without caring about any low-level details,
offering a good enough policy for most cases.
What do you think? Would you accept patches that implement this API?
Cristian