All,
Short version: I want to auto-restart a particular guest any time it shuts
down. libvirt hooks can't call 'virsh start foo'. What is a good/simple
way to do this?
Long version:
I have what is probably an unusual set of requirements. To summarize
briefly:
I have an OS image that runs 3rd party binaries that I can't modify. These
binaries occasionally get they system into a state that can not be resolved
by restarting the software.
We can detect the error state, but not prevent it or fix either the OS or
the software. (The joys of closed-source software... sigh.) For a long
time this has been running on real hardware, and a small monitor was
written that will initiate a reboot of the system when it gets in that
state.
I ALSO use a custom qemu command line option: -snapshot so that I throw
away all changes each time the system shuts down. These are all part of a
compute cluster and no non-transient data is ever on these systems.
What I'm trying to do: use snapshots AND on every OS boot, start fresh from
the base image.
It works if I give two custom options to qemu-kvm (-snapshot -no-reboot).
Everytime I shut the system down, and then start it, I get back to the
starting image. But if the error happens, I either shut down (with the
-no-reboot option) and stay down, or I do a warm-boot and dont discard all
filesystem changes.
So this looked to me like a place to use libvirt hooks (Centos 7, I had to
create /etc/libvirt/hooks and restart the libvirtd service). I made a qemu
hook that watches for this dom name to get a release event so that I could
auto-restart it. Of course you probably already know what I didn't - you
can't call 'virsh start' safely from a libvirt hook.
So, how can I make libvirt auto restart a dom? a libvirt hook doesn't seem
to be the way.
Several options occur:
I might make the hook spawn something in the background that sleeps 5
seconds and then does a virsh start - presumably some time after the hook
script exits. This seems somewhat problematic, and not reliable.
I could make a cron-job that checks every minute and just restarts it. I'd
like the restart to be faster than an average of 30 seconds though.
I could replace the <emulator>/.....</emulator> in the dom definition with
a script that wraps the real emulator in a loop, maybe...
I could write an libvirt api consuming application that watches for the
right events (reboot and shutdown) and then does the right thing.
Other ideas?
Fred Clift
Show replies by date