On Tue, May 05, 2009 at 11:38:13PM -0500, Matthew Farrellee wrote:
Daniel P. Berrange wrote:
> On Tue, May 05, 2009 at 04:13:38PM -0400, Hugh O. Brock wrote:
>> Not too long ago we took a patch that allowed QEMU VMs to keep running
>> even if libvirtd died or was restarted.
>>
>> I was talking to Matt Farrellee (cc'd) this afternoon about
>> manageability, and he feels fairly strongly that this behavior should be
>> optional -- in other words, it should be possible to guarantee that if
>> libvirtd dies, it will take all the VMs with the "die-with-libvirtd"
>> flag set down with it.
>>
>> I'm not sure this API is portable to Xen, but it would work on any
>> hypervisor that represents the VM as a normal process.
>>
>> Does this strike anyone else as useful behavior?
>
> This isn't really a model we want in the architecture. That the QEMU
> instances used to die when libvirtd died was an unfortunate artifact
> of the fact that QEMU was the parent process leader. These days all VMs
> are fully daemonized, so there is no parent/child relationship. In fact
> QEMU was really the odd-ball in this respect, because with Xen/OpenVZ/LXC
> and VirtualBox, VMs have always happily continued when libvirtd stopped
> or died, as do storage pools and virtual networks.
>
> This is important because it ensures we can automatically restart the
> libvirtd daemon during RPM upgrades, and provides robustness should a
> bug cause the daemon to crash - the daemon can be trivially restarted
> and continue with no interruption to services being managed.
>
It doesn't appear to be the case that the libvirtd daemon can trivially
restart and continue with no interruptions. Right now it loses track of VMs.
That a is a bug then, if you can reproduce it, please file a BZ ticket
so we can track it down & fix it.
In a scenario where VMs are not deployed and locked to specific
physical
nodes, it can be highly valuable to have ways to ensure a VM is no
longer running when a layer of its management stops functioning.
IMHO this is a problem to be solved by clustering software. If the
clustering software detects a failure with the management service,
then it should power fence the entire node. Relying on management
service failure to kill the VMs will never be reliable enough.
Daniel
--
|: Red Hat, Engineering, London -o-
http://people.redhat.com/berrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org -o-
http://ovirt.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|