On Thu, 2016-09-15 at 18:10 +0200, Benoit wrote:
Hi Andrea,
Hi,
sorry for taking so long to reply.
I'm still unable to reproduce your failure, which of course
makes it very difficult to debug :(
But I might have a couple leads, so let's try and move
forward with that.
> Since you're running an Arch variant, I'm going to
assume
> all your software is basically at the latest version all
> around, isn't it?
Yep exactly, all software / kernel are in latest version
From the log below it looks like you're running 2.1.0, which
might be the latest version your distribution provides, but
the same is not true when it comes to upstream development.
> Can you try adding
>
> [Unit]
> Requires=libvirtd.service
>
> to your libvirt-guests.service via 'systemctl edit' and
> see if it helps? I don't think it should be needed for your
> use case, but it's probably a good idea to have it there
> anyways.
I have tried and as you expected it didn't changed anything, same issue.
Can you please provide the output of
# systemctl cat libvirt-guests.service
and
# systemctl show libvirt-guests.service
as well? Those could contain helpful information.
I made more test and it is very reproductible, let me give you my
steps
to reproduce it :
-I shutdown -r now my host.
-when it restarted, first thing to check is : systemctl status
libvirt-guests.service, I got :
libvirt-guests.service - Suspend Active Libvirt Guests
Loaded: loaded (/usr/lib/systemd/system/libvirt-guests.service;
enabled; vendor preset: disabled)
From the output of systemctl it looks like you didn't add
the Requires=libvirtd.service relationship. Maybe you just
removed it after testing?
Active: failed (Result: exit-code) since Thu 2016-09-15
17:40:07
CEST; 1min 47s ago
Docs: man:libvirtd(8)
http://libvirt.org
Main PID: 329 (code=exited, status=1/FAILURE)
Sep 15 17:40:07 libvirt-guests.sh[329]: Resuming guest : error: failed
to connect to the hypervisor
Sep 15 17:40:07 libvirt-guests.sh[329]: error: no valid connection
Sep 15 17:40:07 libvirt-guests.sh[329]: error: Failed to connect socket
to '/var/run/libvirt/libvirt-sock': No such file or directory
Sep 15 17:40:07 libvirt-guests.sh[329]: Resuming guest : error: failed
to connect to the hypervisor
Sep 15 17:40:07 libvirt-guests.sh[329]: error: no valid connection
Sep 15 17:40:07 libvirt-guests.sh[329]: error: Failed to connect socket
to '/var/run/libvirt/libvirt-sock': No such file or directory
Sep 15 17:40:07 systemd[1]: libvirt-guests.service: Main process exited,
code=exited, status=1/FAILURE
Sep 15 17:40:07 systemd[1]: Failed to start Suspend Active Libvirt Guests.
Sep 15 17:40:07 systemd[1]: libvirt-guests.service: Unit entered failed
state.
Sep 15 17:40:07 systemd[1]: libvirt-guests.service: Failed with result
'exit-code'.
Even if the service is in FAILED mode, my guests are started correctly.
Is it possible that you have configured some (all?) of your
guests to be autostarted? That would explain why the guests
are running after reboot even though libvirt-guests failed
to connect to libvirtd.
Even then, libvirt-guests is written in such a way that the
fact a guest has already been started should not result in
failure.
I've checked, and it looks to me like the socket is created
very early in the libvirtd startup process, way before
attempting to start autostarted guests, so I don't see how
that could cause issues.
Two things stand out from the log:
1) Resuming guest : error: failed to connect to the hypervisor
That progress message is supposed to contain the guest
name right after "guest", but you have an empty string
instead. That's consistent with the script being unable
to connect to libvirtd
2) the script can't connect to libvirtd
which is extremely weird because, right before looping
throught the guests and starting them, a connectivity
check is performed to make sure libvirtd can be reached
Can you please provide a full, verbose libvirtd log[1]?
Can you also add[2]
set -x
to
/usr/libexec/libvirt-guests.sh
and then provide the output of
# systemctl -b 0 -u libvirt-guests.service
after rebooting the host?
If I do a systemctl restart libvirt-guests.service, everything is
good :
libvirt-guests.service - Suspend Active Libvirt Guests
Loaded: loaded (/usr/lib/systemd/system/libvirt-guests.service;
enabled; vendor preset: disabled)
Active: active (exited) since Thu 2016-09-15 17:42:01 CEST; 9min ago
Docs: man:libvirtd(8)
http://libvirt.org
Main PID: 605 (code=exited, status=0/SUCCESS)
Sep 15 17:42:01 systemd[1]: Starting Suspend Active Libvirt Guests...
Sep 15 17:42:01 systemd[1]: Started Suspend Active Libvirt Guests.
Now if I do a shutdown -r now again, I can see both in my libvirt logs +
in my guests log that an ACPI commands has been sent from libvirt and my
guests shutdown gently, so everything is ok.
When I start again, and try to do a systemctl status
libvirt-guests.service, I got the same FAILED status as before....
I made a research on my logs :
Sep 15 17:39:29 libvirt-guests.sh[1164]: Shutting down guests on default
URI...
Sep 15 17:39:29 libvirt-guests.sh[1164]: Starting shutdown on guest: ape
Sep 15 17:39:30 libvirt-guests.sh[1164]: Waiting for guest ape to shut
down, 30 seconds left
Looks like you changed at least a couple of settings here,
eg. the timeout. Can you please provide the output of
$ grep -Ev '^(#|$)' /etc/sysconfig/libvirt-guests
so we can be sure we're on the same page?
Sep 15 17:39:35 libvirt-guests.sh[1164]: Waiting for guest ape to
shut
down, 25 seconds left
Sep 15 17:39:36 libvirtd[326]: internal error: End of file from monitor
Sep 15 17:39:36 libvirtd[326]: Cannot write data: Broken pipe
[...]
We see the same error as systemctl status plus some error like
"
internal error: End of file from monitor" and others type.
Don't know if I have anything to take a look there ?
That might be a red herring, I think it's just the QEMU
process being killed during shutdown and thus ending
communication with libvirtd abruptly.
Finally, I also notice some very strange thing:
If I stop the service with systemctl stop libvirt-guests.service it
tooks a while to execute the stop command and fails.
Sep 15 18:00:44 libvirt-guests.sh[978]: Starting shutdown on guest: ape
Sep 15 18:00:45 libvirt-guests.sh[978]: Waiting for guest ape to shut
down, 30 seconds left
Sep 15 18:00:50 libvirt-guests.sh[978]: Waiting for guest ape to shut
down, 25 seconds left
Sep 15 18:00:55 libvirt-guests.sh[978]: Waiting for guest ape to shut
down, 20 seconds left
Sep 15 18:01:00 libvirt-guests.sh[978]: Waiting for guest ape to shut
down, 15 seconds left
Sep 15 18:01:06 libvirt-guests.sh[978]: Waiting for guest ape to shut
down, 10 seconds left
Sep 15 18:01:11 libvirt-guests.sh[978]: Waiting for guest ape to shut
down, 5 seconds left
Sep 15 18:01:15 libvirt-guests.sh[978]: Shutdown of guest ape failed to
complete in time.
So my guests are not able to shutdown properly but when I shutdown -r
now my host I can see that my guests have been shutdown gently, I guess
there is something related on my issue....
It looks like at least one of the guests is taking awfully
long to shut down. What if you bump up the timeout? Does it
manage to shut down before reaching it?
[1] Follow instructions at
http://wiki.libvirt.org/page/DebugLogs
[2] When editing the file, add the new instruction *right
after* the first line. Of course you should make a backup
copy of the file before tweaking it
--
Andrea Bolognani / Red Hat / Virtualization