On Thu, Oct 11, 2018 at 13:06:14 +0100, Dr. David Alan Gilbert wrote:
* Peter Krempa (pkrempa(a)redhat.com) wrote:
> On Tue, Oct 09, 2018 at 19:34:43 +0200, Markus Armbruster wrote:
> > Cc: libvir-list for review of the compatibility argument below.
> >
> > Daniel Henrique Barboza <danielhb413(a)gmail.com> writes:
> >
> > > Hey David,
> > >
> > > On 9/21/18 9:29 AM, Dr. David Alan Gilbert wrote:
> > >> * Daniel Henrique Barboza (danielhb413(a)gmail.com) wrote:
> > >>> changes in v2:
> > >>> - removed the "RFC" marker;
> > >>> - added a new patch (patch 2) that removes
> > >>> bdrv_snapshot_delete_by_id_or_name from the code;
> > >>> - made changes in patch 1 as suggested by Murilo;
> > >>> - previous patch set link:
> > >>>
https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html
> > >>>
> > >>>
> > >>> It is not uncommon to see bugs being opened by testers that
attempt to
> > >>> create VM snapshots using HMP. It turns out that "0" and
"1" are quite
> > >>> common snapshot names and they trigger a lot of bugs. I gave an
example
> > >>> in the commit message of patch 1, but to sum up here: QEMU treats
the
> > >>> input of savevm/loadvm/delvm sometimes as 'ID', sometimes
as 'name'. It
> > >>> is documented as such, but this can lead to strange situations.
> > >>>
> > >>> Given that it is strange for an API to consider a parameter to be
2 fields
> > >>> at the same time, and inadvently treating them as one or the
other, and
> > >>> that removing the ID field is too drastic, my idea here is to keep
the
> > >>> ID field for internal control, but do not let the user set it.
> > >>>
> > >>> I guess there's room for discussion about considering this
change an API
> > >>> change or not. It doesn't affect users of HMP and it
doesn't affect Libvirt,
> > >>> but simplifying the meaning of the parameters of
savevm/loadvm/delvm.
> > >> Can you clarify a couple of things:
> > >> a) What is it about libvirt's use that means it's OK ? Does
it never
> > >> use numeric tags?
> > >
> > > I am glad you asked because my understanding in how Libvirt was dealing
> > > with snapshots was wrong, and I just looked into it further to answer
your
> > > question. Luckily, this series fixes the situation for Libvirt as well.
> > >
> > > I was misled by the fact that Libvirt does not show the same symptoms
> > > we see in
> > > QEMU of this problem, but the bug is there. Here's a quick test with
> > > Libvirt with
> > > "0" and "1" as snapshot names, considering a VM named
with no snapshots,
> > > using QEMU 2.12 and Libvirt 4.0.0:
> > >
> > > - create the "0" snapshot:
> > >
> > > $ sudo virsh snapshot-create-as --name 0 dhb
> > > Domain snapshot 0 created
> > >
> > > $ sudo virsh snapshot-list dhb
> > > Name Creation Time State
> > > ------------------------------------------------------------
> > > 0 2018-09-24 15:47:56 -0400 running
> > >
> > > $ sudo virsh qemu-monitor-command dhb --hmp info snapshots
> > > List of snapshots present on all disks:
> > > ID TAG VM SIZE DATE VM CLOCK
> > > -- 0 405M 2018-09-24 15:47:56 00:04:20.867
> > >
> > >
> > > - created the "1" snapshot. Here, Libvirt shows both snapshots
with
> > > snapshot-list,
> > > but the snapshot was erased inside QEMU:
> > >
> > > $ sudo virsh snapshot-create-as --name 1 dhb
> > > Domain snapshot 1 created
> > > $
> > > $ sudo virsh snapshot-list dhb
> > > Name Creation Time State
> > > ------------------------------------------------------------
> > > 0 2018-09-24 15:47:56 -0400 running
> > > 1 2018-09-24 15:50:09 -0400 running
> > >
> > > $ sudo virsh qemu-monitor-command dhb --hmp info snapshots
> > > List of snapshots present on all disks:
> > > ID TAG VM SIZE DATE VM CLOCK
> > > -- 1 404M 2018-09-24 15:50:10 00:05:36.226
> > >
> > >
> > > This is where I stopped checking out Libvirt at first, concluding
> > > wrongly that it
> > > was immune to the bug.
> > >
> > > Libvirt does not throw an error when trying to apply snapshot 0. In
> > > fact, it acts
> > > like everything went fine:
> > >
> > > $ sudo virsh snapshot-revert dhb 0
> > >
> > > $ echo $?
> > > 0
> >
> > Is that a libvirt bug?
>
> Probably yes. The error handling from HMP sucks and can't really be
> fixed in all cases. This is for the handler which calls "loadvm":
>
> if (strstr(reply, "No block device supports snapshots") != NULL) {
> virReportError(VIR_ERR_OPERATION_INVALID, "%s",
> _("this domain does not have a device to load
snapshots"));
> goto cleanup;
> } else if (strstr(reply, "Could not find snapshot") != NULL) {
> virReportError(VIR_ERR_OPERATION_INVALID,
> _("the snapshot '%s' does not exist, and was not
loaded"),
> name);
> goto cleanup;
> } else if (strstr(reply, "Snapshots not supported on device") != NULL)
{
> virReportError(VIR_ERR_OPERATION_INVALID, "%s", reply);
> goto cleanup;
> } else if (strstr(reply, "Could not open VM state file") != NULL) {
> virReportError(VIR_ERR_OPERATION_FAILED, "%s", reply);
> goto cleanup;
> } else if (strstr(reply, "Error") != NULL
> && strstr(reply, "while loading VM state") != NULL)
{
> virReportError(VIR_ERR_OPERATION_FAILED, "%s", reply);
> goto cleanup;
> } else if (strstr(reply, "Error") != NULL
> && strstr(reply, "while activating snapshot on") !=
NULL) {
> virReportError(VIR_ERR_OPERATION_FAILED, "%s", reply);
> goto cleanup;
> }
>
> If the above does not match the reported error, we report success. The
> same problem can happen with any of the 5 [1] HMP command handlers we
> still have.
>
> Note that a similar abomination is also for the code which calls
> "savevm".
Would the following small qemu change make life a little safer:
diff --git a/hmp.c b/hmp.c
index 576253a01f..0729a8c7ed 100644
--- a/hmp.c
+++ b/hmp.c
@@ -62,7 +62,7 @@ static void hmp_handle_error(Monitor *mon, Error **errp)
{
assert(errp);
if (*errp) {
- error_report_err(*errp);
+ error_reportf_err(*errp, "Error: ");
}
}
changing:
No block device supports snapshots
to:
Error: No block device supports snapshots
so you could check for Error: at the start and know that you've hit some
unknown error?
That certainly would help, provided that the message is not localized in
any way. I'd be very glad to see it also for 'savevm' and 'delvm'.
It's a shame that drive_add and drive_del don't have that either, but
they will soon become unused.