On 01/23/14 15:58, Daniel P. Berrange wrote:
On Wed, Jan 22, 2014 at 01:40:26PM +0100, Laszlo Ersek wrote:
> On 01/22/14 12:45, Laine Stump wrote:
>> On 01/22/2014 12:45 PM, Daniel P. Berrange wrote:
>>> On Wed, Jan 22, 2014 at 01:33:18AM +0100, Laszlo Ersek wrote:
>>>> Recently,
>>>>
>>>> commit 96fddee322c7d39a57cfdc5e7be71326d597d30a
>>>> Author: Laine Stump <laine(a)laine.org>
>>>> Date: Mon Dec 2 14:07:12 2013 +0200
>>>>
>>>> qemu: add "-boot strict" to commandline whenever
possible
>>>>
>>>> introduced a regression for OVMF guests. The symptoms and causes are
>>>> described in patch 3/4, and in
>>>>
>>>>
https://bugzilla.redhat.com/show_bug.cgi?id=1056258
>>>>
>>>> Let's allow users to opt-out of "-boot strict=on" while
preserving it as
>>>> default.
>>> I don't really get from that bug description why this can't be
>>> made to work as desired in OVMF. It seems like its is just a
>>> bug in the OVMF impl that it doesn't work.
>>
>> I was on the verge of making that same comment in question form. From
>> the information in the patches and the BZ, it sounds like either "--boot
>> strict" is implemented incorrectly for OVMF, or OVMF doesn't do the
>> proper thing with the "HALT".
>>
>> What does OVMF do with bootable devices that aren't given a specific
>> boot order? For seabios, those devices are all on the boot list
>> following those with specific orders; this is what necessitates --boot
>> strict. The behavior of the option should be consistent regardless of
>> BIOS choice.
>
> Here's again how OVMF works, in detail.
>
> First, the list of openfirmware device paths is downloaded from fw_cfg.
>
> Then they are translated to UEFI device path *prefixes*. This
> translation (even just for the prefixes) is inexact (best effort),
> because no complete mapping exists. Also it can only cover UEFI device
> path prefixes because the OpenFirmware device paths don't extend into
> file paths. In UEFI you can have two separate boot options that boot two
> separate files from the exact same device (including partition), and you
> can't distinguish these in OpenFirmware device paths. Certainly not on
> the qemu command line.
>
> OK, so now you have two lists, the list of UEFI boot options (pre-set by
> the user in the firmware, or auto-generated by the firmware, doesn't
> matter), and the translated prefix list from qemu/fw_cfg.
>
> OVMF then iterates over the fw_cfg list, looks up the first prefix match
> from the UEFI boot option list that matches the current translated
> fw_cfg entry. If it is found, then this UEFI boot option is appended to
> the output list, and the UEFI boot option is also marked as having been
> added to the output list
>
> When the outer loop completes, you have a third list (the output list)
> which describes the user's boot preference. You also have some boot
> options that are unmarked (left unmatched by any translated fw_cfg
> entry). The question is what you do with these.
>
> Originally, I simply dropped these. This is precisely the -boot
> strict=on behavior. And it was wrong. Users wanted to keep at least
> *some* of these entries at the end of the list. My first question was
> "ok why don't you just specify those in fw_cfg?" And the answer is
that
> those options *cannot* be specified.
Ok, so this is the crux of the matter it seems to me.
There are a bunch of boot options which cannot be expressed on
the QEMU command line, or libvirt XML. Users are trying to deal
with this problem by going behind QEMU/libvirt's back and hacking
the firmware image to encode these options.
You are technically right.
I only have a problem with the word "hacking". Setting UEFI boot options
(which also includes saving them in flash memory, which indeed can be
considered part of the firmware), is *valid*. It's not "going behind"
anyone's back. It's a perfectly valid thing to do in a UEFI guest (even
at runtime, with the efibootmgr Linux utility for example).
I place this in the same category as people who replace the QEMU
binary with a shell wrapper script to add a bunch of extra cli
args. ie totally unsupportable.
I agree that wrapper scripts are unsupportable.
I disagree that users setting their persistent boot variables form
inside the guest fall in the same category. That feature is an
unalienable part of UEFI.
The goal we have is that the XML description should fully describe
the configuration of a VM. Having a situation where the XML config
only partially describes the setup, and the rest is delegated to
a config embedded in the firmware image is not something we wish
to support.
I understand.
IMHO what we need to tackle here is the inability to properly
configure the firmware boot order from QEMU / libvirt. ie make
it possible for users to fully control it via libvirt XML.
We'd face two hurdles towards this goal.
- The first is that you'd need to get a basically free-form string
through. Technically it wouldn't be very hard, but it's completely
foreign from the current bootindex concept in libvirt/qemu.
In UEFI, "bootable device" can refer to something that's just a chunk of
guest RAM for qemu.
- The second hurdle is that you couldn't *offer* the host-side user sane
choices (device paths for UEFI boot options) beyond a limit. This is
because device paths come to existence by the execution and stacking of
UEFI drivers, and their binding to devices.
Notice that the current OVMF approach is not
generate device paths
It is
translate OFW paths to UEFI devpath prefixes, heuristically, and try
to match them against existent device paths that have come to be due
to the execution of the driver stack
When a user is picking a new boot option at the UEFI config screen, or
at the UEFI shell prompt, then those UEFI drivers *exist*. The device
paths can be enumerated, traversed, and the user can indeed browse the
file systems on various devices, and pick whatever he wants. Then the
corresponding device path is saved, and UEFI can boot it (and OVMF can
match against it too).
When a user tries to pick a new boot option at runtime (ie. after
ExitBootServices(), eg. with efibootmgr, the UEFI drivers and the device
paths generated by them don't exist. Consequently efibootmgr can only
create relative HD() boot paths (because it can access the GUIDs in GPT
partition tables at runtime). "Relative HD() boot option" is a supported
concept in UEFI, because the GUID in the GPT entry is globally unique in
separation too, so the UEFI code can look it up at boot time, wherever
it is. The filename to boot from that filesystem must me specified though.
As soon as you want to configure a netboot (PXE) option, you need to
supply extra information on efibootmgr's command line, because the
"relative NIC boot option" concept doesn't exist in UEFI.
So in libvirt the same issue would come up. By browsing the guest's
disks, relative HD() boot paths could be generated and offered, but
nothing else. No netboot option that's directly bootable by UEFI, and
certainly no option for the memory mapped shell.
We'd have to invent some simple grammar that would trickle from libvirt
to qemu to OVMF. Like GPT GUIDs + filenams for disk boots, MACs + remote
filenames for NICs (PXE), and the word SHELL for the shell. Then OVMF's
own boot policy could try to look these up between the existing handles
and device paths, and turn them into real boot options. Ad absurdum, a
"UEFI guest agent" could be imagined (hw serial port is available,
virtio-serial driver could be written), providing libvirt with "everything".
I guess all this is possible to implement in a multi-(month|year)
dedicated project. Alas, hardwiring '-boot strict=on' is breaking the
OVMF logic *right now*.
If you're opposed to take this series, then I'll change OVMF to
recognize and ignore HALT. It won't be the Right Thing (TM) -- because a
command line qemu user passing '-boot strict=on' manually, and *not*
wanting to boot the shell, might be surprised -- but it will be the
least wrong solution for now. Leaving the code as-is breaks the current
boot order logic 100%. Obeying HALT (which cannot be turned off) would
prevent all libvirt+OVMF users from reaching the UEFI shell (which
people do want to reach).
Do you suggest that I do that? I'm fine with it (I even started coding
it before deciding to fix libvirt instead), so please feel free to
suggest that.
Thanks!
Laszlo