Hi everyone,
While doing a live migration, Linux guests will frequently get stuck and
become unresponsive, while the CPU utilization on the host for that
guest goes to 100%. Sometimes they recover, and dmesg then shows that
there's been a clock problem during the live migration:
Clocksource tsc unstable (delta = 35882846234 ns)
So the TSC did a jump of nearly 36 seconds.
Migrations often fail when going from server A to B, but will then work
fine in the other direction.
Both servers are locked to the same NTP source, and are well within 1ms
from one another.
Both hosts are running Ubuntu 13.04 with these versions (from Ubuntu
packages):
Kernel: 3.8.0-35-generic x86_64
Libvirt: 1.0.2
Qemu: 1.4.0
Gluster-fs: 3.4.2 (libvirt access the images via the filesystem, not
using libgfapi yet).
The interconnect between both machines (both for migration and gluster)
is 10GbE.
We have different guests (all Ubuntu releases, 13.04 and 13.10), and
they all seem to be affected.
Clocksource: kvm-clock on all guests.
Clock entry from the guest XML: <clock offset='utc'/>
Now as far as I've read in the documentation of kvm-clock, it
specifically supports live migrations, so I'm a bit surprised at these
problems. There isn't all that much information to find on these issue,
although I have found postings by others that seem to have run into the
same issues, but without a solution.
Any help would be much appreciated.
Regards, Paul Boven.
--
Paul Boven <boven(a)jive.nl> +31 (0)521-596547
Unix/Linux/Networking specialist
Joint Institute for VLBI in Europe -
www.jive.nl
VLBI - It's a fringe science