[libvirt] What to do about the qemu "-boot strict" option

Awhile back a bug was filed against libvirt about the inability to completely exclude a disk from the boot order: https://bugzilla.redhat.com/show_bug.cgi?id=888635 In short, you can't have a domain that used PXE to boot, but also has an un-bootable disk device *even if that disk isn't listed in the boot order*, because if PXE times out (e.g. due to the bridge forwarding delay), the BIOS will move on to the next target, which will be the unbootable disk device (again - even though it wasn't given a boot order), and get stuck at a "/BOOT DISK FAILURE, PRESS ANY KEY" message /until a user intervenes. It was obviously beyond the ability of libvirt to fix this (although it can be worked around by creating a very small disk image with a bootloader that merely instructs the system to reboot, and placing *that* disk in the boot order just after the PXE device), so the BZ was closed as CANTFIX. A couple days ago I noticed that Amos Kong had later actually fixed this problem in seabios and qemu: https://bugzilla.redhat.com/show_bug.cgi?id=888633 https://bugzilla.redhat.com/show_bug.cgi?id=903204 Existing behavior is preserved though, and the new behavior only comes about if "-boot strict" is specified on the qemu commandline. It definitely seems desirable to have this ability in libvirt, but I'm almost of the opinion that this should *always* be the behavior (if you want all devices to be in the boot order, you can just give all of them (or none of them, if you're feeling adventurous) a boot order ranking). But I thought it would be prudent to ask opinions about that before making any patch. So what are the opinions? Should the "if any devices are given a boot order, only attempt to boot from devices that have a boot order specified" behavior just be the default (and only) behavior when qemu/seabios supports it? (this would imply that the old behavior is just a bug)? Or do we need to make it configurable? If it needs to be configurable, the boot-related xml seems to be a bit unorganized (a flat list of elements with mostly a single attribute for each), but I suppose this could be added as a new attribute to the <bios> element...

On Wed, Nov 27, 2013 at 14:37:02 +0200, Laine Stump wrote:
Awhile back a bug was filed against libvirt about the inability to completely exclude a disk from the boot order:
https://bugzilla.redhat.com/show_bug.cgi?id=888635
In short, you can't have a domain that used PXE to boot, but also has an un-bootable disk device *even if that disk isn't listed in the boot order*, because if PXE times out (e.g. due to the bridge forwarding delay), the BIOS will move on to the next target, which will be the unbootable disk device (again - even though it wasn't given a boot order), and get stuck at a "/BOOT DISK FAILURE, PRESS ANY KEY" message /until a user intervenes.
It was obviously beyond the ability of libvirt to fix this (although it can be worked around by creating a very small disk image with a bootloader that merely instructs the system to reboot, and placing *that* disk in the boot order just after the PXE device), so the BZ was closed as CANTFIX.
A couple days ago I noticed that Amos Kong had later actually fixed this problem in seabios and qemu:
https://bugzilla.redhat.com/show_bug.cgi?id=888633 https://bugzilla.redhat.com/show_bug.cgi?id=903204
Existing behavior is preserved though, and the new behavior only comes about if "-boot strict" is specified on the qemu commandline.
It definitely seems desirable to have this ability in libvirt, but I'm almost of the opinion that this should *always* be the behavior (if you want all devices to be in the boot order, you can just give all of them (or none of them, if you're feeling adventurous) a boot order ranking). But I thought it would be prudent to ask opinions about that before making any patch.
So what are the opinions? Should the "if any devices are given a boot order, only attempt to boot from devices that have a boot order specified" behavior just be the default (and only) behavior when qemu/seabios supports it? (this would imply that the old behavior is just a bug)? Or do we need to make it configurable?
I would consider the old behavior as a bug and just use -boot strict whenever we can. Jirka

On Wed, Nov 27, 2013 at 02:37:02PM +0200, Laine Stump wrote:
Awhile back a bug was filed against libvirt about the inability to completely exclude a disk from the boot order:
https://bugzilla.redhat.com/show_bug.cgi?id=888635
In short, you can't have a domain that used PXE to boot, but also has an un-bootable disk device *even if that disk isn't listed in the boot order*, because if PXE times out (e.g. due to the bridge forwarding delay), the BIOS will move on to the next target, which will be the unbootable disk device (again - even though it wasn't given a boot order), and get stuck at a "/BOOT DISK FAILURE, PRESS ANY KEY" message /until a user intervenes.
It was obviously beyond the ability of libvirt to fix this (although it can be worked around by creating a very small disk image with a bootloader that merely instructs the system to reboot, and placing *that* disk in the boot order just after the PXE device), so the BZ was closed as CANTFIX.
I'm fairly sure that the current behaviour we have is a regression vs the original libvirt QEMU driver prior to use of seabios. IOW, I think we should unconditionally be enabling strict=on to fix the flaw. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Wed, Nov 27, 2013 at 02:37:02PM +0200, Laine Stump wrote:
Awhile back a bug was filed against libvirt about the inability to completely exclude a disk from the boot order:
https://bugzilla.redhat.com/show_bug.cgi?id=888635
In short, you can't have a domain that used PXE to boot, but also has an un-bootable disk device *even if that disk isn't listed in the boot order*, because if PXE times out (e.g. due to the bridge forwarding delay), the BIOS will move on to the next target, which will be the unbootable disk device (again - even though it wasn't given a boot order), and get stuck at a "BOOT DISK FAILURE, PRESS ANY KEY" message until a user intervenes.
It was obviously beyond the ability of libvirt to fix this (although it can be worked around by creating a very small disk image with a bootloader that merely instructs the system to reboot, and placing *that* disk in the boot order just after the PXE device), so the BZ was closed as CANTFIX.
We have a reboot-timeout boot parameter to reboot guest if not found bootable device. | commit ac05f3492421caeb05809ffa02c6198ede179e43 | Author: Amos Kong <akong@redhat.com> | Date: Fri Sep 7 11:11:03 2012 +0800 | | add a boot parameter to set reboot timeout | | Added an option to let qemu transfer a configuration file to bios, | "etc/boot-fail-wait", which could be specified by command | -boot reboot-timeout=T | T have a max value of 0xffff, unit is ms. | | With this option, guest will wait for a given time if not find | bootabled device, then reboot. If reboot-timeout is '-1', guest | will not reboot, qemu passes '-1' to bios by default. | | This feature need the new seabios's support. | | Seabios pulls the value from the fwcfg "file" interface, this | interface is used because SeaBIOS needs a reliable way of | obtaining a name, value size, and value. It in no way requires | that there be a real file on the user's host machine. | | Signed-off-by: Amos Kong <akong@redhat.com> | Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> |
A couple days ago I noticed that Amos Kong had later actually fixed this problem in seabios and qemu:
https://bugzilla.redhat.com/show_bug.cgi?id=888633 https://bugzilla.redhat.com/show_bug.cgi?id=903204
Existing behavior is preserved though, and the new behavior only comes about if "-boot strict" is specified on the qemu commandline.
It definitely seems desirable to have this ability in libvirt, but I'm almost of the opinion that this should *always* be the behavior (if you want all devices to be in the boot order, you can just give all of them (or none of them, if you're feeling adventurous) a boot order ranking).
We leave the default as off just for compatibility with old qemu. For libvirt code, you can always use "strict=on"
But I thought it would be prudent to ask opinions about that before making any patch.
So what are the opinions? Should the "if any devices are given a boot order, only attempt to boot from devices that have a boot order specified" behavior just be the default (and only) behavior when qemu/seabios supports it? (this would imply that the old behavior is just a bug)? Or do we need to make it configurable? If it needs to be configurable, the boot-related xml seems to be a bit unorganized (a flat list of elements with mostly a single attribute for each), but I suppose this could be added as a new attribute to the <bios> element...
-- Amos.

On 11/28/2013 01:56 AM, Amos Kong wrote:
On Wed, Nov 27, 2013 at 02:37:02PM +0200, Laine Stump wrote:
Awhile back a bug was filed against libvirt about the inability to completely exclude a disk from the boot order:
https://bugzilla.redhat.com/show_bug.cgi?id=888635
In short, you can't have a domain that used PXE to boot, but also has an un-bootable disk device *even if that disk isn't listed in the boot order*, because if PXE times out (e.g. due to the bridge forwarding delay), the BIOS will move on to the next target, which will be the unbootable disk device (again - even though it wasn't given a boot order), and get stuck at a "BOOT DISK FAILURE, PRESS ANY KEY" message until a user intervenes. It was obviously beyond the ability of libvirt to fix this (although it can be worked around by creating a very small disk image with a bootloader that merely instructs the system to reboot, and placing *that* disk in the boot order just after the PXE device), so the BZ was closed as CANTFIX. We have a reboot-timeout boot parameter to reboot guest if not found bootable device.
Yes. libvirt supports that parameter. The problem was that the disk was "bootable", but just happened to boot into an "operating system" whose only function was to print out BOOT DISK FAILURE, PRESS ANY KEY then wait indefinitely for a key to be pressed :-)
A couple days ago I noticed that Amos Kong had later actually fixed this problem in seabios and qemu:
https://bugzilla.redhat.com/show_bug.cgi?id=888633 https://bugzilla.redhat.com/show_bug.cgi?id=903204
Existing behavior is preserved though, and the new behavior only comes about if "-boot strict" is specified on the qemu commandline.
It definitely seems desirable to have this ability in libvirt, but I'm almost of the opinion that this should *always* be the behavior (if you want all devices to be in the boot order, you can just give all of them (or none of them, if you're feeling adventurous) a boot order ranking). We leave the default as off just for compatibility with old qemu. For libvirt code, you can always use "strict=on"
Right. The fact that it's configurable in qemu raises the question of whether there may be legitimate cases for libvirt where someone would expect/demand the old behavior, and the new behavior would cause breakage of existing setups. If there are, I don't want to prevent them by making it unconfigurable, but if there aren't, I don't want to clutter up config with yet another unnecessary knob.
participants (4)
-
Amos Kong
-
Daniel P. Berrange
-
Jiri Denemark
-
Laine Stump