I've noticed that in some cases systemd was quick enough and even
if libvirt-guests.service is marked to be started after the
libvirtd.service my guests were not resumed as
libvirt-guests.sh failed to connect. This is because of a
simple fact: systemd correctly starts libvirt-guests after it
execs libvirtd. However, the daemon is not able to accept
connections right from the start. It's doing some
initialization which may take ages. This problem is not limited
to systemd only, indeed. Any init system that is able to startup
services in parallel (e.g. OpenRC) may run into this situation.
The fix is to try connecting not only once, but continuously a few
times with a small sleep in between tries.
Signed-off-by: Michal Privoznik <mprivozn(a)redhat.com>
---
tools/libvirt-guests.sh.in | 19 +++++++++++++------
1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/tools/libvirt-guests.sh.in b/tools/libvirt-guests.sh.in
index 38e93c5..f14598e 100644
--- a/tools/libvirt-guests.sh.in
+++ b/tools/libvirt-guests.sh.in
@@ -37,6 +37,8 @@ SHUTDOWN_TIMEOUT=300
PARALLEL_SHUTDOWN=0
START_DELAY=0
BYPASS_CACHE=0
+CONNECT_RETRIES=10
+RETRIES_SLEEP=.5
test -f "$sysconfdir"/sysconfig/libvirt-guests &&
. "$sysconfdir"/sysconfig/libvirt-guests
@@ -87,12 +89,17 @@ test_connect()
{
uri=$1
- run_virsh "$uri" connect 2>/dev/null
- if [ $? -ne 0 ]; then
- eval_gettext "Can't connect to \$uri. Skipping."
- echo
- return 1
- fi
+ for ((i = 0; i < ${CONNECT_RETRIES}; i++)); do
+ run_virsh "$uri" connect 2>/dev/null
+ if [ $? -eq 0 ]; then
+ return 0;
+ fi
+ sleep ${RETRIES_SLEEP}
+ eval_gettext "Unable to connect to libvirt currently. Retrying .. \$i"
+ done
+ eval_gettext "Can't connect to \$uri. Skipping."
+ echo
+ return 1
}
# list_guests URI PERSISTENT
--
1.9.0