On Fri, Jan 22, 2016 at 15:23:43 +0000, Daniel P. Berrange wrote:
On Fri, Jan 22, 2016 at 04:17:42PM +0100, Jiri Denemark wrote:
> On Fri, Jan 22, 2016 at 15:07:04 +0000, Daniel P. Berrange wrote:
> > On Thu, Jan 21, 2016 at 11:20:46AM +0100, Jiri Denemark wrote:
> > > VIR_DOMAIN_EVENT_SUSPENDED_POSTCOPY and VIR_DOMAIN_PAUSED_POSTCOPY are
> > > used on the source host once migration enters post-copy mode (which
> > > means the domain gets paused on the source. After the destination host
> > > takes over the execution of the domain, its virtual CPUs are resumed and
> > > the domain enters VIR_DOMAIN_RUNNING_POSTCOPY state and
> > > VIR_DOMAIN_EVENT_RESUMED_POSTCOPY event is emitted.
> > >
> > > In case migration fails during post-copy mode and none of the hosts have
> > > complete state of the domain, both domains will remain paused with
> > > VIR_DOMAIN_PAUSED_POSTCOPY_FAILED reason and an upper layer may decide
> > > what to do.
> > >
> > > Signed-off-by: Jiri Denemark <jdenemar(a)redhat.com>
> >
> > > @@ -2380,6 +2383,8 @@ typedef enum {
> > > VIR_DOMAIN_EVENT_SUSPENDED_RESTORED = 4, /* Restored from paused
state file */
> > > VIR_DOMAIN_EVENT_SUSPENDED_FROM_SNAPSHOT = 5, /* Restored from paused
snapshot */
> > > VIR_DOMAIN_EVENT_SUSPENDED_API_ERROR = 6, /* suspended after failure
during libvirt API call */
> > > + VIR_DOMAIN_EVENT_SUSPENDED_POSTCOPY = 7, /* suspended for post-copy
migration */
> > > + VIR_DOMAIN_EVENT_SUSPENDED_POSTCOPY_FAILED = 8, /* suspended after
failed post-copy */
> >
> > Presumably the POSTCOPY_FAILED event can only be emitted
> > on the target, since the source will already be suspended
> > when we see a failure, and it doesn't make sense to issue
> > a suspended event when we're already suspended.
>
> But would it cause any harm? I figured it might be better to emit the
> event and set the state to POSTCOPY_FAILED even on the source so that
> apps/users don't have to guess whether POSTCOPY means it's still running
> or if it already failed.
The lifecycle events are supposed to be implementing a state machine,
and we're not changing state in this case. I think applications that
are currently using libvirt would reasonably consider it an error if
libvirt issues an event for a state it is already in, and I could see
it causing them to mistakenly run some logic twice if they get two
SUSPEND events for the same domain in a row.
We already emit some events several times in a row, but I agree it
doesn't make sense to add more cases like that. It would actually be a
good idea to fix the existing double events (in another patch series in
the future).
Jirka