virsh feature request: cold restart and/or synchronous shutdown option
When libvirt components get updated on a hypervisor, VMs continue running with obsolete files. Programs like "needrestart" alert one to this condition with messages like: VM guests are running outdated hypervisor (qemu) binaries on this host: 'bazquux' with pid 1234 Normal restarting of a VM (from within or with `virsh reboot ...`) does not resolve it because it's 'warm'; VMs must be completely shut down then started again to utilise the updates. Afaik, to do so reliably, in a way that minimises downtime, requires scripts like this: (d=bazquux; virsh shutdown --domain "$d" && until LC_ALL=C LANG=C virsh domstate --domain "$d" | grep -Fi 'shut off' &> /dev/null; do echo ...; sleep 1; done && virsh start --domain "$d") Adding a timeout complicates that further. Options for this logic to be performed by virsh would be beneficial, as a single operation: virsh reboot --domain bazquux --cold And by allowing poweroff to wait: (d=bazquux; virsh shutdown --domain "$d" --synchronous && virsh start --domain "$d") The latter would also assist with scripting changes to a VM that can only be performed whilst it is not running (shutdown -> alter -> start). Ideally, these would accept `--timeout n` so that if the shutdown portion was not completed quickly enough, they would fail and exit with a non-zero status, and the former would not start the VM again. I'm not sure of a good way to handle this where `on_poweroff` is set to something that doesn't result in the machine being shut off.
On 3/12/26 01:51, roy-orbitson--- via Devel wrote:
When libvirt components get updated on a hypervisor, VMs continue running with obsolete files. Programs like "needrestart" alert one to this condition with messages like:
VM guests are running outdated hypervisor (qemu) binaries on this host: 'bazquux' with pid 1234
Normal restarting of a VM (from within or with `virsh reboot ...`) does not resolve it because it's 'warm'; VMs must be completely shut down then started again to utilise the updates. Afaik, to do so reliably, in a way that minimises downtime, requires scripts like this:
(d=bazquux; virsh shutdown --domain "$d" && until LC_ALL=C LANG=C virsh domstate --domain "$d" | grep -Fi 'shut off' &> /dev/null; do echo ...; sleep 1; done && virsh start --domain "$d")
Adding a timeout complicates that further. Options for this logic to be performed by virsh would be beneficial, as a single operation:
virsh reboot --domain bazquux --cold
And by allowing poweroff to wait:
(d=bazquux; virsh shutdown --domain "$d" --synchronous && virsh start --domain "$d")
There is an inherent problem with bundling two or more operations under a single command (and you point it out at the end): what to do when one operation in sequence fails? And specifically for powering off/rebooting a VM - that requires guest cooperation. What to do, when guest (silently) refuses poweroff/reboot request (e.g. acpid is not running)? The moment virsh introduces a timeout there's going to be a user for whom the timeout is short.
The latter would also assist with scripting changes to a VM that can only be performed whilst it is not running (shutdown -> alter -> start). Ideally, these would accept `--timeout n` so that if the shutdown portion was not completed quickly enough, they would fail and exit with a non-zero status, and the former would not start the VM again.
I'm not sure of a good way to handle this where `on_poweroff` is set to something that doesn't result in the machine being shut off.
Michal
You can concede that virtual machines need a cold start to utilise updated software, right? And that users needing to do so manually is suboptimal? It's a problem that I think needs addressing but your response seems a little dismissive. Michal Prívozník wrote:
There is an inherent problem with bundling two or more operations under a single command (and you point it out at the end): what to do when one operation in sequence fails?
These are two separate problems, the "at the end" one is discussed at the bottom of this message. With the first, I don't believe it is inherent to bundling operations together. The same challenge exists with the current tooling, and virsh makes it the user's responsibility, but it could assist instead. If either operation fails, the single command should fail. I don't see how it could be any other way. Furthermore, bundled operations would solve the case where a VM is configured to restart in `on_shutdown`, where running a shutdown command alone never results in a 'cold' state. A cold reboot option, would allow virsh to fully shut down the VM, knowing that the config obligation will be met by starting it again afterwards.
And specifically for powering off/rebooting a VM - that requires guest cooperation. What to do, when guest (silently) refuses poweroff/reboot request (e.g. acpid is not running)?
Then the command would wait indefinitely (unless interrupted) or fail after the timeout, as stated previously. This is no different from a user script polling in a loop without a break, or a loop with a limit with the current version of virsh. Is there a better way? This is what I've seen in real-world use, e.g.: https://github.com/crc-org/snc/blob/2f4fb10767b7733d69946490c93820cd5ec3c9f4... The other alternative I can think of is first running an event monitoring command in one shell, like: (d=bazquux; LC_ALL=C LANG=C virsh event --domain "$d" --event lifecycle --loop | grep -qF ': Shutdown Finished' && virsh start --domain "$d") then running the shutdown command in another. Both this and the example in my previous message are quite convoluted for automation. They're also fragile because they rely on parsing output intended for humans, which may change in future. In cases where virsh could determine it would never succeed, even if it knew it was supposed to start the machine again, then it could return this failure immediately, without attempting to modify the VM's state. With this feature, it would still be up to the user to deal with, which is no worse than the situation now. There are plenty of ways to write shell commands that will never complete. Some are built not to, like `yes`. That's not virsh's problem.
The moment virsh introduces a timeout there's going to be a user for whom the timeout is short.
In this scenario, virsh doesn't introduce the timeout, the user does, and its up to them to set its length. If it's too short, that's their fault. The default would be no timeout.
if the shutdown portion was not completed quickly enough, they would fail and exit with a non-zero status, and the former would not start the VM again.
A timeout would only need apply to the shutdown operation. In a cold reboot, the machine would request the shutdown and return failure if it did not happen in time. If it did complete in time, it would return success if the start operation also succeeded. It could only cause the command to deliberately fail, not dictate whether or not the VM eventually shuts down.
I'm not sure of a good way to handle this where `on_poweroff` is set to something that doesn't result in the machine being shut off.
The second problem you reference above is this one, with a shutdown command like: virsh shutdown --domain bazquux --synchronous It may succeed but result in the VM being active afterwards, due to config, though what the user really wants is to wait until the machine is off. Given virsh could not know from that command whether the user intends to wait until it's completely off or just restarted, it might need a second option, like: virsh shutdown --domain bazquux --synchronous --cold Then the command could fail immediately (with an explanation) if config would prevent a 'cold' state, or wait indefinitely until it does. This feature request is about improving a common use case, not changing current or default command behaviour, nor covering every possible edge case, which virsh does not do now, anyway.
On Thu, Mar 12, 2026 at 00:51:54 -0000, roy-orbitson--- via Devel wrote:
When libvirt components get updated on a hypervisor, VMs continue running with obsolete files. Programs like "needrestart" alert one to this condition with messages like:
VM guests are running outdated hypervisor (qemu) binaries on this host: 'bazquux' with pid 1234
Normal restarting of a VM (from within or with `virsh reboot ...`) does not resolve it because it's 'warm'; VMs must be completely shut down then started again to utilise the updates. Afaik, to do so reliably, in a way that minimises downtime, requires scripts like this:
(d=bazquux; virsh shutdown --domain "$d" && until LC_ALL=C LANG=C virsh domstate --domain "$d" | grep -Fi 'shut off' &> /dev/null; do echo ...; sleep 1; done && virsh start --domain "$d")
Adding a timeout complicates that further. Options for this logic to be performed by virsh would be beneficial, as a single operation:
virsh reboot --domain bazquux --cold
And by allowing poweroff to wait:
(d=bazquux; virsh shutdown --domain "$d" --synchronous && virsh start --domain "$d")
Aside from the complexities Michal pointed out, I've added 'virsh await' some time ago. It allows you to wait for VM to reach a "condition", one of the conditions I've added would allow you to do what you want: virsh shutdown $DOM; virsh await --condition domain-inactive $DOM 'virsh await' terminates only after $DOM is in inactive state. It also supports '--timeout' to break out from deadlocks if needed.
Ahh, that is excellent. I didn't find that, and the version I'm working with pre-dates that feature. If I'm correct in thinking this does not cover the case where `on_poweroff` is set to restart, which cannot be altered on a running VM, so the await can never see an inactive state... then a cold reboot option could still be worthwhile. Otherwise, I guess one must run something like: virsh destroy <domain> --graceful && virsh await <domain> --condition domain-inactive && virsh start <domain>
On Fri, Mar 13, 2026 at 06:01:56 -0000, Roy Orbitson via Devel wrote:
Ahh, that is excellent. I didn't find that, and the version I'm working with pre-dates that feature.
If I'm correct in thinking this does not cover the case where `on_poweroff` is set to restart, which cannot be altered on a running VM, so the await can never see an inactive state... then a cold reboot option could still be worthwhile.
You can also temporarily override the poweroff action via: virsh set-lifecycle-action --live DOM --action destroy That doesn't influence the next-boot config.
Otherwise, I guess one must run something like:
virsh destroy <domain> --graceful && virsh await <domain> --condition domain-inactive && virsh start <domain>
'virsh destroy' even with the --graceful flag will not wait for OS shutdown. It will just avoid sending SIGKILL if the previous SIGTERM is ignored by (a stuck) qemu. A normal 'virsh destroy' does use SIGTERM too, just follows up with SIGKILL if it's taking too long. Thus using await in the above scenario makes little sense because the return code from virsh destroy will tell you if qemu quit or not.
So, to clarify, the best way to power-cycle a (persistent) VM and minimise downtime is: virsh set-lifecycle-action $DOM poweroff destroy --live && virsh await $DOM --condition domain-inactive && virsh start $DOM I still think this would be nicer: virsh reboot $DOM --cold
On Fri, Mar 13, 2026 at 07:46:30 -0000, Roy Orbitson via Devel wrote:
So, to clarify, the best way to power-cycle a (persistent) VM and minimise downtime is:
virsh set-lifecycle-action $DOM poweroff destroy --live && virsh await $DOM --condition domain-inactive && virsh start $DOM
'virsh set-lifecycle-action' just changes the setting of <on_poweroff>. You still need to initiate the shutdown via virsh shutdown.
I still think this would be nicer:
virsh reboot $DOM --cold
Compounding too many operations into one is brittle and may cause that some assumptions are baked in which may not make sense in other scenarios. E.g. the above case could wait an unbounded amount of time if the guest os ignores graceful shutdown. What to do when such thing happens is a policy decision, as in some cases you might not want to just 'virsh destroy' the VM and potentialy lose data. 'virsh await' has --timeout but what to do when the timeout is reached isn't IMO for us to decide. Same would apply if we were to compound everything into what you propose.
On Fri, Mar 13, 2026 at 09:06:58AM +0100, Peter Krempa via Devel wrote:
On Fri, Mar 13, 2026 at 07:46:30 -0000, Roy Orbitson via Devel wrote:
So, to clarify, the best way to power-cycle a (persistent) VM and minimise downtime is:
virsh set-lifecycle-action $DOM poweroff destroy --live && virsh await $DOM --condition domain-inactive && virsh start $DOM
'virsh set-lifecycle-action' just changes the setting of <on_poweroff>. You still need to initiate the shutdown via virsh shutdown.
I still think this would be nicer:
virsh reboot $DOM --cold
Compounding too many operations into one is brittle and may cause that some assumptions are baked in which may not make sense in other scenarios.
E.g. the above case could wait an unbounded amount of time if the guest os ignores graceful shutdown. What to do when such thing happens is a policy decision, as in some cases you might not want to just 'virsh destroy' the VM and potentialy lose data.
'virsh await' has --timeout but what to do when the timeout is reached isn't IMO for us to decide. Same would apply if we were to compound everything into what you propose.
I would highly recommend not trying todo this in shell using virsh. This is far better suited to writing a short python script using the libvirt-python API, where thereis much more flexibility and easier control to get the behaviour desired. With regards, Daniel -- |: https://berrange.com ~~ https://hachyderm.io/@berrange :| |: https://libvirt.org ~~ https://entangle-photo.org :| |: https://pixelfed.art/berrange ~~ https://fstop138.berrange.com :|
participants (5)
-
Daniel P. Berrangé -
Michal Prívozník -
Peter Krempa -
Roy Orbitson -
roy-orbitson@devo.net.au