Re: [libvirt] [PATCH v2] qemu: snapshot: Forbid internal snapshots with pflash firmware

23 Mar 2017


      On 03/23/17 15:07, Peter Krempa wrote:
...
On Thu, Mar 23, 2017 at 11:03:02 +0100, Laszlo Ersek wrote:
...
On 03/23/17 10:54, Peter Krempa wrote:
...
On Thu, Mar 23, 2017 at 10:48:01 +0100, Laszlo Ersek wrote:
...
On 03/23/17 10:29, Peter Krempa wrote:
...
If the variable store (<nvram>) file is raw qemu can't do a snapshot of
it and thus the snapshot would be incomplete. QEMU does no reject such
snapshot.
Additionally allowing to use a qcow2 variable store backing file would
solve this issue but then it would become eligible to become target of
the memory dump.
Offline internal snapshot would be incomplete too with either storage
format since libvirt does not handle the pflash file in this case.
Forbid such snapshot so that we can avoid problems.
[...]
...
...
@@ -13873,8 +13873,14 @@ qemuDomainSnapshotPrepare(virConnectPtr conn,
         goto cleanup;
     }
-    /* Internal snapshots don't work with VMs with OVMF loader since qemu does
-     * not snapshot the variable store */
+    /* internal snapshots + pflash based loader have the following problems:
+     * - if the variable store is raw, the snapshot is incomplete
+     * - alowing a qcow2 image as the varstore would make it eligible to receive
+     *   the vmstate dump, which would make it huge
+     * - offline snapshot would not snapshot the varstore at all
+     *
+     * Avoid the issues by forbidding this completely.
+     */
I thought about this a bit more and I think that while there are the
above problems we still can have users of snapshots + OVMF which use it
successfully. Forbiding it would create a regression for them since they
did not observe anything bad despite the problems mentioned above:
The reasons are following:
1) internal snapshots are the default in virt-manager
2) guests usually don't re-write the varstore very often, usually only
at install
3) OSes usually don't modify anything besides the boot entry
4) snapshot of an online VM carries the varstore in the memory image
5) OSes are pretty good at restoring the boot entry if it fails
Due to the facts above I think that there are users that legitimately
think that snapshots with pflash loaders work as expected. It's mostly
due to the fact that the data are pretty static and OSes don't store
anything important there and are able to self-heal some of the problems.
I think we should not disallow this to avoid usability regressions. We
can add documentation that states that it's unsafe to do snapshots.
Additionally we will need to add support for external snapshots, which
currently have similar kind of problems, although fixable.
The tradeoff is between a seemingly working, but inherently unsafe
operation, and a clear error message that keeps things safe.

The UEFI variable store is used for more and different things than you
mention, such as (in roughly decreasing order of importance):

* Some UEFI variables (the authenticated ones) have security impact.
This covers the standardized ones used for Secure Boot (Platform Key,
Key Exchange Keys, white-listed certificates and signatures (DB) and
black-listed certificates and signatures (DBX)).

* To my knowledge, it also includes some similar security-related
variables used by shim / MokManager (where "MOK" is short for Machine
Owner Key); that is, non-volatile variables to which shim delegates the
EFI binaries' verification, from the standardized Secure Boot interfaces.

* UEFI variables can serve as the backend for the linux "pstore"
(persistent store) file system. Pstore in turn can be used to save the
last part of dmesg on a crash. The messages can be re-read at a new boot.

* Firmware uses (reads/writes) a number of variables internally at each
boot. These may not be critical. One example is a variable that helps
reduce UEFI memory map fragmentation over a series of boots.

* OVMF manipulates UEFI boot options on each boot, according to the
bootindex properties (or more directly, according to the "bootorder"
fw_cfg file). Although, admittedly, this is likely the least risky
category of varstore contents.


While I have myself successfully used -- offline only -- internal
snapshots with OVMF guests, hand-waving away the knowledge that the
varstore was never actually snapshotted, I feel real uncomfortable about
silently performing an inherently lossy operation, especially when the
varstore may well have security impact.

Users will not read the documentation (they never do), and I would
rather not field future bug reports about obscure Secure Boot misbehavior.

It is ultimately up to libvirt developers, but IMO, if we continue to
allow this unsafe operation, then the minimum would be:

* in virt-manager, pop up an extra confirmation dialog, with clear
indication that the operation will be lossy and could have security impact,

* in virsh, reject the operation with a similar error message, unless
"--force" or something similar were specified.

* And, because there are other (independent) libvirt client
applications, this would likely require a new flag on the libvirt API
level, so that libvirt itself can reject unsafe snapshotting requests,
regardless of the client application that submits it.

I agree that usability regressions are not nice and will likely generate
bug reports; however, *if* they are in direct conflict with security
improvements, then security improvements trump usability regressions. I
guess we can allow users to ignore security, but they need to be
informed on the spot, and they have to opt in.

I would prefer if we went ahead with this patch; but, again, it's up to
you in the end.

Thanks!
Laszlo