I'm using KVM, libvirt and DRBD to come up with a custom mirroring and
live migration setup.
I have two servers, each running KVM and libvirt, and both using DRBD
to mirror guests (which are stored on raw DRBD partitions). If I shut
down a guest on one, I can easily get it up and running on the other
without issue.
I have run into some file system corruption trouble if I try to do a
live migration. I have a test script just to test out the bare
functionality I'm looking for (this script would run on the second KVM
server; kvmserv02):
#!/bin/sh
if virsh -c qemu+ssh://kvmserv01/system list | grep -q " guest "; then
echo "Saving state on remote server..."
if ! virsh -c qemu+ssh://kvmserv01/system save guest --running
~/guest.save; then
echo "Could not save state of guest"
exit 1
fi
echo "Setting remote drbd node to secondary..."
if ! ssh root@kvmserv01 "drbdadm secondary guest"; then
echo "Could not change remote host to secondary for guest's DRBD
partition"
exit 1
fi
sleep 5
echo "Setting local drbd node to primary..."
if ! drbdadm primary guest; then
echo "Could not change this host to primary for guest's DRBD partition"
exit 1
fi
sleep 5
echo "Restoring state on local server..."
if ! virsh restore ~/guest.save --running; then
echo "Could not restore guest state"
exit 1
fi
echo "Deleting state file on local server..."
rm -f ~/guest.save
else
echo 'guest not running on this host'
fi
# End of script
While the state is restored and the guest comes up on the second
server, I end up with filesystem corruptions. My first test guest had
its file system pretty heavily damaged after a few moves back and
forth.
What am I missing here? Or is this just simply a really horrible way to do it?
--
Aaron Clausen
mightymartianca(a)gmail.com