On Tue, May 12, 2015 at 04:14:37PM +0800, zhang bo wrote:
event poll may become un-triggerable after changing system clock.
The steps to reproduce the problem:
1 run event-test
1 define and start a domain with name vm1.
2 destroy vm1
3 change system time to 1 hour before when timer.expiresAt has been set in
virEventPollUpdateTimeout
(and before virEventPollDispatchTimeouts()).
4 event-test will recive no message until 1 hour later.
The reasons for the problem is :
1 The value of timer.expiresAt is set by virTimeMillisNowRaw. virTimeMillisNowRaw is
effectable by settimeofday(),
bacause it uses CLOCK_REALTIME to get time.
2 If we change the system time to a time long before now, after that timer.expiresAt has
been set. timer.expiresAt
is not affected, while virEventPollDispatchTimeouts is.
Suppose it's now May 12th, and we set it to 10th, then the expiresAt is 12th, and
the time virEventPollDispatchTimeouts
got is 10th.
if (eventLoop.timeouts[i].expiresAt <= (now+20)) { // expiresAt will not be
less than now until 2 days later.
*Solution(not good enough)*:
1 change the clock mode in virTimeMillisNowRaw from REALTIME to MONOTONIC, which would
not be affected by
settimeofday().
2 add the time got from clock_gettime(*MONOTONIC*) with the system-start-time from epoch,
making it equal to the value got from REALTIME.
3 As that the timestamp of the log message should follow system time, so we keep it to
REALTIME as before.
However, there's still problems:
1 pthread_cond_wait() gets time with REALTIME mode. When we change system time,
pthread_cond_wait() may still be affected.
So, Is there any other better solution? thanks in advance.
Simply don't change the system time by massive deltas. Libvirt is not going
to be the only app to be affected. As you mention it is going to hit the
pthread_cond_wait() call which will likely affect pretty much every single
non-trivial process running on the system. I'd expect other apps have much
the same problem with calculating poll sleeps too.
If you need to massively change the system time this should be done at
single user mode, or do a reboot. Once a system is running it should be
kept synced with NTPD which will only ever change system time in very
small increments and so once cause thsi problem.
Regards,
Daniel
--
|:
http://berrange.com -o-
http://www.flickr.com/photos/dberrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|:
http://entangle-photo.org -o-
http://live.gnome.org/gtk-vnc :|