[libvirt] [PATCH/RFC]: don't ignore errors to save the domain status file

Hi, we currently don't report errors to save qemu's domain status file back to the caller. That was o.k. as long as the code was there for testing but now that the XML is being picked up on daemon restart we must handle these. The attached patch does that, although I'm not confident that it's enough. While we return error on device attach/unattach the device actually got attached/unattached already. Same is true for suspend/resume. A better solution would be to write out the new domain status into a temp file and simply call rename(2) on it when the attach/unattach succeeds and discard it if it fails. This would minimize the possible error conditions like disk full, missing permissions to write into that directory etc. Does this sound reasonable? O.k. to apply the attached version for now? Cheers, -- Guido

On Sat, Jan 31, 2009 at 02:32:09PM +0100, Guido Günther wrote:
we currently don't report errors to save qemu's domain status file back to the caller. That was o.k. as long as the code was there for testing but now that the XML is being picked up on daemon restart we must handle these. Any comments on this one? -- Guido

On Sat, Jan 31, 2009 at 02:32:09PM +0100, Guido G?nther wrote:
Hi, we currently don't report errors to save qemu's domain status file back to the caller. That was o.k. as long as the code was there for testing but now that the XML is being picked up on daemon restart we must handle these. The attached patch does that, although I'm not confident that it's enough. While we return error on device attach/unattach the device actually got attached/unattached already. Same is true for suspend/resume. A better solution would be to write out the new domain status into a temp file and simply call rename(2) on it when the attach/unattach succeeds and discard it if it fails. This would minimize the possible error conditions like disk full, missing permissions to write into that directory etc. Does this sound reasonable?
Yeah, I think that sounds like a reasonable idea virPrepareStatusFile() virSaveStatusFile() the former creates the temporary file, and the latter does the rename(). In the prepare stage, we could also do a access(R_OK) check on the eventual target, as a sanity check that we're likely to be able to complete the rename() later. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

Guido Günther <agx@sigxcpu.org> wrote:
we currently don't report errors to save qemu's domain status file back to the caller. That was o.k. as long as the code was there for testing but now that the XML is being picked up on daemon restart we must handle these. The attached patch does that, although I'm not confident that it's enough. While we return error on device attach/unattach the device actually got attached/unattached already. Same is true for suspend/resume. A better solution would be to write out the new domain status into a temp file and simply call rename(2) on it when the attach/unattach succeeds and discard it if it fails. This would minimize the possible error conditions like disk full, missing permissions to write into that directory etc. Does this sound reasonable?
Yes.
O.k. to apply the attached version for now?
Looks good to me. Minor formatting nits:
From 45feca32cc5ac6e48b2b586771736050202fc119 Mon Sep 17 00:00:00 2001 From: =?utf-8?q?Guido=20G=C3=BCnther?= <agx@sigxcpu.org> Date: Sat, 31 Jan 2009 13:10:05 +0100 Subject: [PATCH] don't ignore errors to save the domain status
--- src/qemu_driver.c | 23 ++++++++++++++++------- 1 files changed, 16 insertions(+), 7 deletions(-)
diff --git a/src/qemu_driver.c b/src/qemu_driver.c index 09f69bf..3933dfa 100644 --- a/src/qemu_driver.c +++ b/src/qemu_driver.c @@ -1280,12 +1280,12 @@ static int qemudStartVMDaemon(virConnectPtr conn, if ((qemudWaitForMonitor(conn, driver, vm, pos) < 0) || (qemudDetectVcpuPIDs(conn, vm) < 0) || (qemudInitCpus(conn, vm, migrateFrom) < 0) || - (qemudInitPasswords(conn, driver, vm) < 0)) { + (qemudInitPasswords(conn, driver, vm) < 0) || + (qemudSaveDomainStatus(conn, qemu_driver, vm) < 0)) { qemudShutdownVMDaemon(conn, driver, vm); return -1; } } - qemudSaveDomainStatus(conn, qemu_driver, vm);
return ret; } @@ -1319,7 +1319,10 @@ static void qemudShutdownVMDaemon(virConnectPtr conn ATTRIBUTE_UNUSED, /* shut it off for sure */ virKillProcess(vm->pid, SIGKILL);
- qemudRemoveDomainStatus(conn, driver, vm); + if (qemudRemoveDomainStatus(conn, driver, vm) < 0) { + qemudLog(QEMUD_WARN, _("Failed to remove domain status for %s"), + vm->def->name); + } vm->pid = -1; vm->def->id = -1; vm->state = VIR_DOMAIN_SHUTOFF; @@ -1972,7 +1975,8 @@ static int qemudDomainSuspend(virDomainPtr dom) { VIR_DOMAIN_EVENT_SUSPENDED_PAUSED); VIR_FREE(info); } - qemudSaveDomainStatus(dom->conn, driver, vm); + if (qemudSaveDomainStatus(dom->conn, driver, vm) < 0) + goto cleanup;
s/ //
ret = 0;
cleanup: @@ -2022,7 +2026,8 @@ static int qemudDomainResume(virDomainPtr dom) { VIR_DOMAIN_EVENT_RESUMED_UNPAUSED); VIR_FREE(info); } - qemudSaveDomainStatus(dom->conn, driver, vm); + if (qemudSaveDomainStatus(dom->conn, driver, vm) < 0) + goto cleanup;
Likewise.
ret = 0;
cleanup: @@ -3461,7 +3466,9 @@ static int qemudDomainAttachDevice(virDomainPtr dom, goto cleanup; }
- qemudSaveDomainStatus(dom->conn, driver, vm); + if (!ret && qemudSaveDomainStatus(dom->conn, driver, vm) < 0) + ret = -1; + cleanup: if (ret < 0) virDomainDeviceDefFree(dev); @@ -3578,7 +3585,9 @@ static int qemudDomainDetachDevice(virDomainPtr dom, qemudReportError(dom->conn, dom, NULL, VIR_ERR_NO_SUPPORT, "%s", _("only SCSI or virtio disk device can be detached dynamically"));
- qemudSaveDomainStatus(dom->conn, driver, vm); + if (!ret && qemudSaveDomainStatus(dom->conn, driver, vm) < 0) + ret = -1; + cleanup: virDomainDeviceDefFree(dev); if (vm)

On Thu, Feb 05, 2009 at 06:49:52PM +0100, Jim Meyering wrote:
Guido Günther <agx@sigxcpu.org> wrote:
we currently don't report errors to save qemu's domain status file back to the caller. That was o.k. as long as the code was there for testing but now that the XML is being picked up on daemon restart we must handle these. The attached patch does that, although I'm not confident that it's enough. While we return error on device attach/unattach the device actually got attached/unattached already. Same is true for suspend/resume. A better solution would be to write out the new domain status into a temp file and simply call rename(2) on it when the attach/unattach succeeds and discard it if it fails. This would minimize the possible error conditions like disk full, missing permissions to write into that directory etc. Does this sound reasonable?
Yes.
O.k. to apply the attached version for now?
Looks good to me. Minor formatting nits: I've applied it with your indentation suggestions fixed. -- Guido
participants (3)
-
Daniel P. Berrange
-
Guido Günther
-
Jim Meyering