Ping
As REBOOT is concerned with both qemu and libvirtd, it's not atomic job, thus, it
maybe not just qemu's bug.
Maybe libvirtd should also send RESET qmp command after migration, what do you think of
that?
We've tried to send fakeReboot to dest side to fix this problem, it seems to have
fixed it, but we don't know if it's a right solution.
>>> Hi all:
>>> Here's the steps we produce the problem:
>>> 1 reboot guest with the flag of
>VIR_DOMAIN_REBOOT_ACPI_POWER_BTN
>>> 2 sleep 1 second (so that the guest is still rebooting, although the API
>>already returned.)
>>> 3 migrate the guest
>>>
>>> The problem is that : the guest failed to migrate to the dest, and crashed
>>on source side.
>>>
>>> We don't bother to dig further into the problem, the root cause we
think
is
>>that we migrate a guest while it's rebooting.
>>
>>Migration is expected to work no matter what state the guest OS is currently
>>in. So the fact that its rebooting should be irrelevant. If anything bad happens
>>then its a bug in QEMU most likely
>>
Here's the detailed problem information:
1 qemu failed to 'cont' on the dest side
[2016-07-17 00:16:20] monitor_qapi_event_emit:477 {"timestamp":
{"seconds":
1468685780, "microseconds": 893931}, "event": "MIGRATION",
"data":
{"status": "completed"}}
[2016-07-17 00:16:21] handle_qmp_command:3925 qmp_cmd_name:
query-chardev
[2016-07-17 00:16:21] handle_qmp_command:3925 qmp_cmd_name: cont
//'cont' failed anyway.
2016-07-16 16:16:21.065+0000: shutting down
2016-07-16T16:16:21.065735Z qemu-kvm: terminating on signal 15 from pid
159755(NULL)
2 libvirt got the detailed message as follows:
2016-07-16 16:16:05.875+0000: libvirtd : 159756: info :
virSecurityDACSetOwnershipInternal:290 : Setting DAC user and group on
'/mnt/sdb/subo/migrate/sles.raw' to '0:0'
2016-07-16 16:16:21.061+0000: libvirtd : 159757: warning :
qemuDomainObjEnterMonitorInternal:2358 : This thread seems to be the async
job owner; entering monitor without asking for a nested job is dangerous
2016-07-16 16:16:21.065+0000: libvirtd : 159757: error :
qemuMonitorJSONCheckError:387 : internal error: unable to execute QEMU
command 'cont': Resetting the Virtual Machine is required //RESET is needed,
it's said.
2016-07-16 16:16:21.265+0000: libvirtd : 159757: info :
virSecurityDACRestoreFileLabelInternal:388 : Restoring DAC user and group on
'/mnt/sdb/subo/migrate/sles.raw'
2016-07-16 16:16:21.265+0000: libvirtd : 159757: info :
virSecurityDACSetOwnershipInternal:290 : Setting DAC user and group on
'/mnt/sdb/subo/migrate/sles.raw' to '0:0'
2016-07-16 16:16:21.274+0000: libvirtd : 159757: info :
qemuMigrationJobFinish:6661 : free async job in a migration with dommain
sles11sp3
----
We found the global_state in qemu is related, so we let qemu not send global
state to the dest(back to qemu v2.2)
static bool global_state_needed(void *opaque)
{
return false;
}
Then, we do the reboot+migration again, found that libvirt reports:
2016-07-16 16:01:43.001: libvirtd : 21192: info : doPeer2PeerMigrate3:5491 :
domain sles11sp3 migrate in Confirm3 0x7f46374ac110 cancelled=0
vm=0x7f4614000920
2016-07-16 16:01:43.220: libvirtd : 21192: info : qemuMigrationJobFinish:6661 :
free async job in a migration with dommain sles11sp3
2016-07-16 16:01:43.220: libvirtd : 11807: error : qemuProcessFakeReboot:541 :
internal error: guest unexpectedly quit //This is because
qemuProcessFakeReboot requires JOB LOCK, which is held by migration thread.
After the migration thread release the lock, the guest already got shutoff, so
libvirt reported this error.
So, plus with what I decrypted in last mail, If libvirt *failed to send the qmp
command 'reset' or priv->fakeReboot to the dest*, reboot+migration may not
work well.
What comes to my idea is that: when libvirt gets 'shutdown' event from qemu
during migration, send 'shutdown' and 'priv->fakeReboot' to dest guest
XML,
and when the libvirtd on the dest side finds such words in the guest XML, send
'reset' to qemu before 'cont' it.
But it seems not an elegant solution.
What's your suggestion, Thanks in advance.
Zhang Bo(Oscar)
>
>Thanks, Daniel!
>This answer helps me finds out the right direction.
>
>There's one exeption that libvirt may also take response for migration/reboot
>problems.
>Because there's cooperation between libvirt and qemu to complete the reboot
>job:
>1 libvirt send "powerdown" qmp command to qemu, and set
>vm->priv->fakeReboot to 1.
>2 the guest starts to shutdown
>3 when the guest got shutoff, qemu sends back "shutdown" monitor message
to
>libvirt
>4 libvirt got the message, and checks that vm->priv->fakeReboot is 1, then it
>sends "reset" qmp command to qemu (otherwise sends "poweroff")
>
>The reboot job is not atomic, libvirt->qemu->libvirt->qemu, they have
several
>times of message sending/receiving.
>
>So, we think of the migration + reboot situation:
>1 libvirt send "powerdown" qmp command to qemu, and set
>vm->priv->fakeReboot to 1.
>2 the guest starts to shutdown
>3 libvirt migrate the guest to the destination, BUT *priv->fakeReboot is not
>sended to the dest*
>4 the guest is migrated to the dest
>5 the guest shutdown completely inside itself, and qemu now sends back
>"shutdown" monitor message to libvirt
>6 libvirt at the dest side got the message, and checks that
vm->priv->fakeReboot
>is 0, then it will send "poweroff" rather than "reset" to qemu,
the guest got
>shutoff rather than rebooted here.
>
>EXPECTED:
> Guest got rebooted on the dest
>EXACT:
> Guest got shutoff on the dest.
>
>So, shall we send privateData to the dest also? I found that libvirt parses
>privataData when it reloads/starts, we may also send/parse it during
>migration?
>
>Thanks
>
>Zhang Bo(Oscar)