[libvirt-users] systemctl libvirt-guests.service fails to start during host boot

Hi, I have been using qemu and kvm for a while but I am newbie to libvirt. (but I really like it :) I am on Parabola (fork of Archlinux, using systemd) I only got an issue regarding libvirt-guests service, when my host boots about 7 times to 10 I got a issue on the service. ------------------------------------------------------------------------ systemctl status libvirt-guests.service ● libvirt-guests.service - Suspend Active Libvirt Guests Loaded: loaded (/usr/lib/systemd/system/libvirt-guests.service; enabled; vendor preset: disabled) Active: failed (Result: exit-code) since Thu 2016-08-11 10:55:35 CEST; 36s ago Docs: man:libvirtd(8) http://libvirt.org Process: 347 ExecStart=/usr/lib/libvirt/libvirt-guests.sh start (code=exited, status=1/FAILURE) Main PID: 347 (code=exited, status=1/FAILURE) Aug 11 10:55:35 oms_apex_plaisir systemd[1]: Starting Suspend Active Libvirt Guests... Aug 11 10:55:35 oms_apex_plaisir libvirt-guests.sh[347]: Resuming guests on default URI... Aug 11 10:55:35 oms_apex_plaisir libvirt-guests.sh[347]: Resuming guest : error: failed to connect to the hypervisor Aug 11 10:55:35 oms_apex_plaisir libvirt-guests.sh[347]: error: no valid connection Aug 11 10:55:35 oms_apex_plaisir libvirt-guests.sh[347]: error: Failed to connect socket to '/var/run/libvirt/libvirt-sock' no such file Aug 11 10:55:35 oms_apex_plaisir systemd[1]: libvirt-guests.service: Main process exited, code=exited, status=1/FAILURE Aug 11 10:55:35 oms_apex_plaisir systemd[1]: Failed to start Suspend Active Libvirt Guests. Aug 11 10:55:35 oms_apex_plaisir systemd[1]: libvirt-guests.service: Unit entered failed state. Aug 11 10:55:35 oms_apex_plaisir systemd[1]: libvirt-guests.service: Failed with result 'exit-code'. ------------------------------------------------------------------------ it is like the service starts too early. When I restart the service everything is good and if I shutdown it sends the ACPI message correctly to my guests. I already seen lot of discussion but not able to find any working solution.. any idea to help me figure out the issue ? many thanks belette

On Thu, 2016-08-11 at 11:22 +0200, Benoit wrote:
Hi, I have been using qemu and kvm for a while but I am newbie to libvirt. (but I really like it :) I am on Parabola (fork of Archlinux, using systemd) I only got an issue regarding libvirt-guests service, when my host boots about 7 times to 10 I got a issue on the service. systemctl status libvirt-guests.service ● libvirt-guests.service - Suspend Active Libvirt Guests Loaded: loaded (/usr/lib/systemd/system/libvirt-guests.service; enabled; vendor preset: disabled) Active: failed (Result: exit-code) since Thu 2016-08-11 10:55:35 CEST; 36s ago Docs: man:libvirtd(8) http://libvirt.org Process: 347 ExecStart=/usr/lib/libvirt/libvirt-guests.sh start (code=exited, status=1/FAILURE) Main PID: 347 (code=exited, status=1/FAILURE) Aug 11 10:55:35 oms_apex_plaisir systemd[1]: Starting Suspend Active Libvirt Guests... Aug 11 10:55:35 oms_apex_plaisir libvirt-guests.sh[347]: Resuming guests on default URI... Aug 11 10:55:35 oms_apex_plaisir libvirt-guests.sh[347]: Resuming guest : error: failed to connect to the hypervisor Aug 11 10:55:35 oms_apex_plaisir libvirt-guests.sh[347]: error: no valid connection Aug 11 10:55:35 oms_apex_plaisir libvirt-guests.sh[347]: error: Failed to connect socket to '/var/run/libvirt/libvirt-sock' no such file Aug 11 10:55:35 oms_apex_plaisir systemd[1]: libvirt-guests.service: Main process exited, code=exited, status=1/FAILURE Aug 11 10:55:35 oms_apex_plaisir systemd[1]: Failed to start Suspend Active Libvirt Guests. Aug 11 10:55:35 oms_apex_plaisir systemd[1]: libvirt-guests.service: Unit entered failed state. Aug 11 10:55:35 oms_apex_plaisir systemd[1]: libvirt-guests.service: Failed with result 'exit-code'. it is like the service starts too early. When I restart the service everything is good and if I shutdown it sends the ACPI message correctly to my guests. I already seen lot of discussion but not able to find any working solution.. any idea to help me figure out the issue ? many thanks
That's very weird. Apparently, libvirtd is not running by the time libvirt-guests is started, despite the fact that the unit file for the former contains Before=libvirt-guests.service and the one for the latter contains After=libvirtd.service That should be more than enough to ensure the proper ordering is followed when starting / stopping the services. Is the libvirtd service enabled at all? Are there any startup errors / warnings? Does the libvirt-guests service actually bring up the guests that were running before shutdown when you start it a second time? -- Andrea Bolognani / Red Hat / Virtualization

Thanks Andrea, Yes I confirm to you that I have enabled libvirt as well. I don't see any error right now but I have to investigate a little bit more. My guests are in autostart in virsh so everything is fine on this, the only issue I got is in case of shutdown. The strange thing is that sometimes libvirt-guests is up and running sometimes failed... just for info this is my libvirt-guests.service (by default) [Unit] Description=Suspend Active Libvirt Guests After=network.target libvirtd.service time-sync.target Documentation=man:libvirtd(8) Documentation=http://libvirt.org [Service] EnvironmentFile=-/etc/conf.d/libvirt-guests # Hack just call traditional service until we factor # out the code ExecStart=/usr/lib/libvirt/libvirt-guests.sh start ExecStop=/usr/lib/libvirt/libvirt-guests.sh stop Type=oneshot RemainAfterExit=yes StandardOutput=journal+console TimeoutStopSec=0 [Install] WantedBy=multi-user.target On 08/11/2016 06:36 PM, Andrea Bolognani wrote:
That's very weird. Apparently, libvirtd is not running by the time libvirt-guests is started, despite the fact that the unit file for the former contains
Before=libvirt-guests.service
and the one for the latter contains
After=libvirtd.service
That should be more than enough to ensure the proper ordering is followed when starting / stopping the services.
Is the libvirtd service enabled at all? Are there any startup errors / warnings? Does the libvirt-guests service actually bring up the guests that were running before shutdown when you start it a second time?

On Thu, 2016-08-11 at 20:41 +0200, Benoit wrote:
Thanks Andrea, Yes I confirm to you that I have enabled libvirt as well. I don't see any error right now but I have to investigate a little bit more. My guests are in autostart in virsh so everything is fine on this, the only issue I got is in case of shutdown. The strange thing is that sometimes libvirt-guests is up and running sometimes failed...
I'm trying to track this down, but 1) I can't reproduce it on my setup and 2) it doesn't make any sense :) It's almost as if libvirtd.service would be started during boot, but not every single time. Except you always have it running after boot, right? And After=libvirtd.service tells systemd *exactly* to wait until libvirtd has started before starting libvirt-guests, so the situation you apparently find yourself in should never, ever happen, barring a systemd bug. Since you're running an Arch variant, I'm going to assume all your software is basically at the latest version all around, isn't it?
just for info this is my libvirt-guests.service (by default) [Unit] Description=Suspend Active Libvirt Guests After=network.target libvirtd.service time-sync.target Documentation=man:libvirtd(8) Documentation=http://libvirt.org [Service] EnvironmentFile=-/etc/conf.d/libvirt-guests # Hack just call traditional service until we factor # out the code ExecStart=/usr/lib/libvirt/libvirt-guests.sh start ExecStop=/usr/lib/libvirt/libvirt-guests.sh stop Type=oneshot RemainAfterExit=yes StandardOutput=journal+console TimeoutStopSec=0 [Install] WantedBy=multi-user.target
Can you try adding [Unit] Requires=libvirtd.service to your libvirt-guests.service via 'systemctl edit' and see if it helps? I don't think it should be needed for your use case, but it's probably a good idea to have it there anyways. -- Andrea Bolognani / Red Hat / Virtualization

Hi Andrea,
Since you're running an Arch variant, I'm going to assume all your software is basically at the latest version all around, isn't it? Yep exactly, all software / kernel are in latest version Can you try adding
[Unit] Requires=libvirtd.service
to your libvirt-guests.service via 'systemctl edit' and see if it helps? I don't think it should be needed for your use case, but it's probably a good idea to have it there anyways. I have tried and as you expected it didn't changed anything, same issue. I made more test and it is very reproductible, let me give you my steps to reproduce it :
-I shutdown -r now my host. -when it restarted, first thing to check is : systemctl status libvirt-guests.service, I got : libvirt-guests.service - Suspend Active Libvirt Guests Loaded: loaded (/usr/lib/systemd/system/libvirt-guests.service; enabled; vendor preset: disabled) Active: failed (Result: exit-code) since Thu 2016-09-15 17:40:07 CEST; 1min 47s ago Docs: man:libvirtd(8) http://libvirt.org Main PID: 329 (code=exited, status=1/FAILURE) Sep 15 17:40:07 libvirt-guests.sh[329]: Resuming guest : error: failed to connect to the hypervisor Sep 15 17:40:07 libvirt-guests.sh[329]: error: no valid connection Sep 15 17:40:07 libvirt-guests.sh[329]: error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory Sep 15 17:40:07 libvirt-guests.sh[329]: Resuming guest : error: failed to connect to the hypervisor Sep 15 17:40:07 libvirt-guests.sh[329]: error: no valid connection Sep 15 17:40:07 libvirt-guests.sh[329]: error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory Sep 15 17:40:07 systemd[1]: libvirt-guests.service: Main process exited, code=exited, status=1/FAILURE Sep 15 17:40:07 systemd[1]: Failed to start Suspend Active Libvirt Guests. Sep 15 17:40:07 systemd[1]: libvirt-guests.service: Unit entered failed state. Sep 15 17:40:07 systemd[1]: libvirt-guests.service: Failed with result 'exit-code'. Even if the service is in FAILED mode, my guests are started correctly. If I do a systemctl restart libvirt-guests.service, everything is good : libvirt-guests.service - Suspend Active Libvirt Guests Loaded: loaded (/usr/lib/systemd/system/libvirt-guests.service; enabled; vendor preset: disabled) Active: active (exited) since Thu 2016-09-15 17:42:01 CEST; 9min ago Docs: man:libvirtd(8) http://libvirt.org Main PID: 605 (code=exited, status=0/SUCCESS) Sep 15 17:42:01 systemd[1]: Starting Suspend Active Libvirt Guests... Sep 15 17:42:01 systemd[1]: Started Suspend Active Libvirt Guests. Now if I do a shutdown -r now again, I can see both in my libvirt logs + in my guests log that an ACPI commands has been sent from libvirt and my guests shutdown gently, so everything is ok. When I start again, and try to do a systemctl status libvirt-guests.service, I got the same FAILED status as before.... I made a research on my logs : Sep 15 17:39:29 libvirt-guests.sh[1164]: Shutting down guests on default URI... Sep 15 17:39:29 libvirt-guests.sh[1164]: Starting shutdown on guest: ape Sep 15 17:39:30 libvirt-guests.sh[1164]: Waiting for guest ape to shut down, 30 seconds left Sep 15 17:39:35 libvirt-guests.sh[1164]: Waiting for guest ape to shut down, 25 seconds left Sep 15 17:39:36 libvirtd[326]: internal error: End of file from monitor Sep 15 17:39:36 libvirtd[326]: Cannot write data: Broken pipe Sep 15 17:39:36 libvirt-guests.sh[1164]: Shutdown of guest ape complete. Sep 15 17:39:36 libvirt-guests.sh[1164]: Starting shutdown on guest: om Sep 15 17:39:37 libvirt-guests.sh[1164]: Waiting for guest om to shut down, 30 seconds left Sep 15 17:39:40 libvirtd[326]: internal error: End of file from monitor Sep 15 17:39:40 libvirtd[326]: Cannot write data: Broken pipe Sep 15 17:39:40 libvirt-guests.sh[1164]: Shutdown of guest om complete. Sep 15 17:40:07 libvirt-guests.sh[329]: Resuming guests on default URI... Sep 15 17:40:07 libvirt-guests.sh[329]: Resuming guest : error: failed to connect to the hypervisor Sep 15 17:40:07 libvirt-guests.sh[329]: error: no valid connection Sep 15 17:40:07 libvirt-guests.sh[329]: error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory Sep 15 17:40:07 libvirt-guests.sh[329]: Resuming guest : error: failed to connect to the hypervisor Sep 15 17:40:07 libvirt-guests.sh[329]: error: no valid connection Sep 15 17:40:07 libvirt-guests.sh[329]: error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory Sep 15 17:40:07 systemd[1]: libvirt-guests.service: Main process exited, code=exited, status=1/FAILURE Sep 15 17:40:07 systemd[1]: libvirt-guests.service: Unit entered failed state. Sep 15 17:40:07 systemd[1]: libvirt-guests.service: Failed with result 'exit-code'. Sep 15 17:40:07 libvirtd[325]: libvirt version: 2.1.0 Sep 15 17:40:07 libvirtd[325]: hostname: oo.localdomain Sep 15 17:40:07 libvirtd[325]: Cannot check dnsmasq binary /sbin/dnsmasq: No such file or directory Sep 15 17:40:07 libvirtd[325]: direct firewall backend requested, but /sbin/ebtables is not available: No such file or directory Sep 15 17:40:07 libvirtd[325]: internal error: Failed to initialize a valid firewall backend Sep 15 17:40:07 libvirtd[325]: internal error: Failed to find path for dmidecode binary Sep 15 17:40:07 libvirtd[325]: internal error: Failed to find path for dmidecode binary Sep 15 17:40:08 virtlogd[421]: libvirt version: 2.1.0 We see the same error as systemctl status plus some error like " internal error: End of file from monitor" and others type. Don't know if I have anything to take a look there ? Finally, I also notice some very strange thing: If I stop the service with systemctl stop libvirt-guests.service it tooks a while to execute the stop command and fails. Sep 15 18:00:44 libvirt-guests.sh[978]: Starting shutdown on guest: ape Sep 15 18:00:45 libvirt-guests.sh[978]: Waiting for guest ape to shut down, 30 seconds left Sep 15 18:00:50 libvirt-guests.sh[978]: Waiting for guest ape to shut down, 25 seconds left Sep 15 18:00:55 libvirt-guests.sh[978]: Waiting for guest ape to shut down, 20 seconds left Sep 15 18:01:00 libvirt-guests.sh[978]: Waiting for guest ape to shut down, 15 seconds left Sep 15 18:01:06 libvirt-guests.sh[978]: Waiting for guest ape to shut down, 10 seconds left Sep 15 18:01:11 libvirt-guests.sh[978]: Waiting for guest ape to shut down, 5 seconds left Sep 15 18:01:15 libvirt-guests.sh[978]: Shutdown of guest ape failed to complete in time. So my guests are not able to shutdown properly but when I shutdown -r now my host I can see that my guests have been shutdown gently, I guess there is something related on my issue.... Many thanks for your help again

On Thu, 2016-09-15 at 18:10 +0200, Benoit wrote:
Hi Andrea,
Hi, sorry for taking so long to reply. I'm still unable to reproduce your failure, which of course makes it very difficult to debug :( But I might have a couple leads, so let's try and move forward with that.
Since you're running an Arch variant, I'm going to assume all your software is basically at the latest version all around, isn't it? Yep exactly, all software / kernel are in latest version
From the log below it looks like you're running 2.1.0, which might be the latest version your distribution provides, but the same is not true when it comes to upstream development.
Can you try adding [Unit] Requires=libvirtd.service to your libvirt-guests.service via 'systemctl edit' and see if it helps? I don't think it should be needed for your use case, but it's probably a good idea to have it there anyways. I have tried and as you expected it didn't changed anything, same issue.
Can you please provide the output of # systemctl cat libvirt-guests.service and # systemctl show libvirt-guests.service as well? Those could contain helpful information.
I made more test and it is very reproductible, let me give you my steps to reproduce it : -I shutdown -r now my host. -when it restarted, first thing to check is : systemctl status libvirt-guests.service, I got : libvirt-guests.service - Suspend Active Libvirt Guests Loaded: loaded (/usr/lib/systemd/system/libvirt-guests.service; enabled; vendor preset: disabled)
From the output of systemctl it looks like you didn't add the Requires=libvirtd.service relationship. Maybe you just removed it after testing?
Active: failed (Result: exit-code) since Thu 2016-09-15 17:40:07 CEST; 1min 47s ago Docs: man:libvirtd(8) http://libvirt.org Main PID: 329 (code=exited, status=1/FAILURE) Sep 15 17:40:07 libvirt-guests.sh[329]: Resuming guest : error: failed to connect to the hypervisor Sep 15 17:40:07 libvirt-guests.sh[329]: error: no valid connection Sep 15 17:40:07 libvirt-guests.sh[329]: error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory Sep 15 17:40:07 libvirt-guests.sh[329]: Resuming guest : error: failed to connect to the hypervisor Sep 15 17:40:07 libvirt-guests.sh[329]: error: no valid connection Sep 15 17:40:07 libvirt-guests.sh[329]: error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory Sep 15 17:40:07 systemd[1]: libvirt-guests.service: Main process exited, code=exited, status=1/FAILURE Sep 15 17:40:07 systemd[1]: Failed to start Suspend Active Libvirt Guests. Sep 15 17:40:07 systemd[1]: libvirt-guests.service: Unit entered failed state. Sep 15 17:40:07 systemd[1]: libvirt-guests.service: Failed with result 'exit-code'. Even if the service is in FAILED mode, my guests are started correctly.
Is it possible that you have configured some (all?) of your guests to be autostarted? That would explain why the guests are running after reboot even though libvirt-guests failed to connect to libvirtd. Even then, libvirt-guests is written in such a way that the fact a guest has already been started should not result in failure. I've checked, and it looks to me like the socket is created very early in the libvirtd startup process, way before attempting to start autostarted guests, so I don't see how that could cause issues. Two things stand out from the log: 1) Resuming guest : error: failed to connect to the hypervisor That progress message is supposed to contain the guest name right after "guest", but you have an empty string instead. That's consistent with the script being unable to connect to libvirtd 2) the script can't connect to libvirtd which is extremely weird because, right before looping throught the guests and starting them, a connectivity check is performed to make sure libvirtd can be reached Can you please provide a full, verbose libvirtd log[1]? Can you also add[2] set -x to /usr/libexec/libvirt-guests.sh and then provide the output of # systemctl -b 0 -u libvirt-guests.service after rebooting the host?
If I do a systemctl restart libvirt-guests.service, everything is good : libvirt-guests.service - Suspend Active Libvirt Guests Loaded: loaded (/usr/lib/systemd/system/libvirt-guests.service; enabled; vendor preset: disabled) Active: active (exited) since Thu 2016-09-15 17:42:01 CEST; 9min ago Docs: man:libvirtd(8) http://libvirt.org Main PID: 605 (code=exited, status=0/SUCCESS) Sep 15 17:42:01 systemd[1]: Starting Suspend Active Libvirt Guests... Sep 15 17:42:01 systemd[1]: Started Suspend Active Libvirt Guests. Now if I do a shutdown -r now again, I can see both in my libvirt logs + in my guests log that an ACPI commands has been sent from libvirt and my guests shutdown gently, so everything is ok. When I start again, and try to do a systemctl status libvirt-guests.service, I got the same FAILED status as before.... I made a research on my logs : Sep 15 17:39:29 libvirt-guests.sh[1164]: Shutting down guests on default URI... Sep 15 17:39:29 libvirt-guests.sh[1164]: Starting shutdown on guest: ape Sep 15 17:39:30 libvirt-guests.sh[1164]: Waiting for guest ape to shut down, 30 seconds left
Looks like you changed at least a couple of settings here, eg. the timeout. Can you please provide the output of $ grep -Ev '^(#|$)' /etc/sysconfig/libvirt-guests so we can be sure we're on the same page?
Sep 15 17:39:35 libvirt-guests.sh[1164]: Waiting for guest ape to shut down, 25 seconds left Sep 15 17:39:36 libvirtd[326]: internal error: End of file from monitor Sep 15 17:39:36 libvirtd[326]: Cannot write data: Broken pipe
[...]
We see the same error as systemctl status plus some error like " internal error: End of file from monitor" and others type. Don't know if I have anything to take a look there ?
That might be a red herring, I think it's just the QEMU process being killed during shutdown and thus ending communication with libvirtd abruptly.
Finally, I also notice some very strange thing: If I stop the service with systemctl stop libvirt-guests.service it tooks a while to execute the stop command and fails. Sep 15 18:00:44 libvirt-guests.sh[978]: Starting shutdown on guest: ape Sep 15 18:00:45 libvirt-guests.sh[978]: Waiting for guest ape to shut down, 30 seconds left Sep 15 18:00:50 libvirt-guests.sh[978]: Waiting for guest ape to shut down, 25 seconds left Sep 15 18:00:55 libvirt-guests.sh[978]: Waiting for guest ape to shut down, 20 seconds left Sep 15 18:01:00 libvirt-guests.sh[978]: Waiting for guest ape to shut down, 15 seconds left Sep 15 18:01:06 libvirt-guests.sh[978]: Waiting for guest ape to shut down, 10 seconds left Sep 15 18:01:11 libvirt-guests.sh[978]: Waiting for guest ape to shut down, 5 seconds left Sep 15 18:01:15 libvirt-guests.sh[978]: Shutdown of guest ape failed to complete in time. So my guests are not able to shutdown properly but when I shutdown -r now my host I can see that my guests have been shutdown gently, I guess there is something related on my issue....
It looks like at least one of the guests is taking awfully long to shut down. What if you bump up the timeout? Does it manage to shut down before reaching it? [1] Follow instructions at http://wiki.libvirt.org/page/DebugLogs [2] When editing the file, add the new instruction *right after* the first line. Of course you should make a backup copy of the file before tweaking it -- Andrea Bolognani / Red Hat / Virtualization
participants (2)
-
Andrea Bolognani
-
Benoit