[Libvir] save/restore support for KVM

Hi, I've implemented save/restore support for KVM using the live migration feature. First, a few patches to KVM to fix bugs: Subject: [PATCH 1/3] qemu: fix freed pointer dereference http://article.gmane.org/gmane.comp.emulators.kvm.devel/5572 Subject: [PATCH 2/3] qemu: don't start a new migration if one is already in progress http://article.gmane.org/gmane.comp.emulators.kvm.devel/5575 Subject: [PATCH 3/3] qemu: reset buffer pointers after CR/LF http://article.gmane.org/gmane.comp.emulators.kvm.devel/5573 (If compatibility with old KVM is wanted, it might be possible to work around the kvm bugs in other ways, but I'm not sure). Then, another patch to KVM to support inbound migration from a filename. It already supports migration from stdin, but adding this seemed easier from a libvirt perspective. Subject: [PATCH] qemu: accept filename for incoming migration http://article.gmane.org/gmane.comp.emulators.kvm.devel/5590 Finally, the libvirt side. Some notes: - Arbitrary decisions about VM status: I pause the VM before starting migration, and destroy it once it's finished. Neither is required by KVM but I'd be concerned about the disk image state otherwise. Also, after resuming, the VM is still paused. I don't know how Xen behaves in this regard but the behavior here is easy enough to change. - I append the domain's UUID at the end of the migration image. This doesn't affect KVM at all (it ignores the extra data). Does that seem reasonable? It's unclear how the saved image is supposed to get associated with a particular VM configuration without doing something like this. - I added the migrateFrom field to the qemu_vm struct. Dunno if that's the best place. Could also add put it in qemu_vm_def, or add parameters to qemudStartVMDaemon, etc.. Patch against current CVS is below. Signed-off-by: Jim Paris <jim@jtan.com> -jim Index: src/qemu_conf.c =================================================================== RCS file: /data/cvs/libvirt/src/qemu_conf.c,v retrieving revision 1.10 diff -u -r1.10 qemu_conf.c --- src/qemu_conf.c 7 Aug 2007 13:02:35 -0000 1.10 +++ src/qemu_conf.c 9 Aug 2007 21:15:10 -0000 @@ -1518,7 +1518,8 @@ (vm->def->os.initrd[0] ? 2 : 0) + /* initrd */ (vm->def->os.cmdline[0] ? 2 : 0) + /* cmdline */ (vm->def->graphicsType == QEMUD_GRAPHICS_VNC ? 2 : - (vm->def->graphicsType == QEMUD_GRAPHICS_SDL ? 0 : 1)); /* graphics */ + (vm->def->graphicsType == QEMUD_GRAPHICS_SDL ? 0 : 1)) + /* graphics */ + (vm->migrateFrom[0] ? 3 : 0); /* migrateFrom */ snprintf(memory, sizeof(memory), "%d", vm->def->memory/1024); snprintf(vcpus, sizeof(vcpus), "%d", vm->def->vcpus); @@ -1767,6 +1768,15 @@ /* SDL is the default. no args needed */ } + if (vm->migrateFrom[0]) { + if (!((*argv)[++n] = strdup("-S"))) + goto no_memory; + if (!((*argv)[++n] = strdup("-incoming"))) + goto no_memory; + if (!((*argv)[++n] = strdup(vm->migrateFrom))) + goto no_memory; + } + (*argv)[++n] = NULL; return 0; Index: src/qemu_conf.h =================================================================== RCS file: /data/cvs/libvirt/src/qemu_conf.h,v retrieving revision 1.6 diff -u -r1.6 qemu_conf.h --- src/qemu_conf.h 30 Jul 2007 09:59:06 -0000 1.6 +++ src/qemu_conf.h 9 Aug 2007 21:15:10 -0000 @@ -213,6 +213,7 @@ char configFile[PATH_MAX]; char autostartLink[PATH_MAX]; + char migrateFrom[PATH_MAX]; struct qemud_vm_def *def; /* The current definition */ struct qemud_vm_def *newDef; /* New definition to activate at shutdown */ Index: src/qemu_driver.c =================================================================== RCS file: /data/cvs/libvirt/src/qemu_driver.c,v retrieving revision 1.14 diff -u -r1.14 qemu_driver.c --- src/qemu_driver.c 30 Jul 2007 09:59:06 -0000 1.14 +++ src/qemu_driver.c 9 Aug 2007 21:15:10 -0000 @@ -656,7 +656,7 @@ if (virExecNonBlock(conn, argv, &vm->pid, &vm->stdout, &vm->stderr) == 0) { vm->id = driver->nextvmid++; - vm->state = VIR_DOMAIN_RUNNING; + vm->state = vm->migrateFrom[0] ? VIR_DOMAIN_PAUSED : VIR_DOMAIN_RUNNING; driver->ninactivevms--; driver->nactivevms++; @@ -1852,29 +1852,151 @@ return 0; } +#define QEMUD_SAVE_MAGIC "LibvirtQemudUUID" -static int qemudDomainSave(virDomainPtr dom, - const char *path ATTRIBUTE_UNUSED) { +static int qemudDomainSave(virDomainPtr dom, + const char *path) { struct qemud_driver *driver = (struct qemud_driver *)dom->conn->privateData; struct qemud_vm *vm = qemudFindVMByID(driver, dom->id); + char *command, *info; + int fd; + if (!vm) { - qemudReportError(dom->conn, dom, NULL, VIR_ERR_INVALID_DOMAIN, "no domain with matching id %d", dom->id); + qemudReportError(dom->conn, dom, NULL, VIR_ERR_INVALID_DOMAIN, + "no domain with matching id %d", dom->id); return -1; } + if (!qemudIsActiveVM(vm)) { - qemudReportError(dom->conn, dom, NULL, VIR_ERR_OPERATION_FAILED, "domain is not running"); + qemudReportError(dom->conn, dom, NULL, VIR_ERR_OPERATION_FAILED, + "domain is not running"); return -1; } - qemudReportError(dom->conn, dom, NULL, VIR_ERR_OPERATION_FAILED, "save is not supported"); - return -1; + + if (strchr(path, '\'') || strchr(path, '\\') ) { + qemudReportError(dom->conn, dom, NULL, VIR_ERR_OPERATION_FAILED, + "invalid filename"); + return -1; + } + + /* Pause */ + if (qemudDomainSuspend(dom) != 0) { + qemudReportError(dom->conn, dom, NULL, VIR_ERR_OPERATION_FAILED, + "failed to pause domain"); + return -1; + } + + /* Migrate to file. */ + if (asprintf (&command, "migrate \"exec:dd of='%s' 2>/dev/null\"\n", + path) == -1) { + qemudReportError(dom->conn, dom, NULL, VIR_ERR_OPERATION_FAILED, + "out of memory"); + return -1; + } + + if (qemudMonitorCommand(driver, vm, command, &info) < 0) { + qemudReportError(dom->conn, dom, NULL, VIR_ERR_OPERATION_FAILED, + "migrate operation failed"); + free(command); + return -1; + } + + free(info); + free(command); + + /* Append the UUID, which we'll need in order to restore the domain. */ + if ((fd = open(path, O_APPEND|O_WRONLY)) < 0) { + qemudReportError(dom->conn, dom, NULL, VIR_ERR_OPERATION_FAILED, + "failed to open '%s'", path); + return -1; + } + + if (write(fd, QEMUD_SAVE_MAGIC, sizeof(QEMUD_SAVE_MAGIC)) != + sizeof(QEMUD_SAVE_MAGIC)) { + qemudReportError(dom->conn, dom, NULL, VIR_ERR_OPERATION_FAILED, + "failed to write magic"); + close(fd); + return -1; + } + + if (write(fd, dom->uuid, VIR_UUID_BUFLEN) != VIR_UUID_BUFLEN) { + qemudReportError(dom->conn, dom, NULL, VIR_ERR_OPERATION_FAILED, + "failed to write uuid"); + close(fd); + return -1; + } + close(fd); + + /* Shut it down */ + qemudShutdownVMDaemon(dom->conn, driver, vm); + if (!vm->configFile[0]) + qemudRemoveInactiveVM(driver, vm); + + return 0; } static int qemudDomainRestore(virConnectPtr conn, - const char *path ATTRIBUTE_UNUSED) { - /*struct qemud_driver *driver = (struct qemud_driver *)conn->privateData;*/ - qemudReportError(conn, NULL, NULL, VIR_ERR_OPERATION_FAILED, "restore is not supported"); - return -1; + const char *path) { + struct qemud_driver *driver = (struct qemud_driver *)conn->privateData; + unsigned char uuid[VIR_UUID_BUFLEN]; + unsigned char magic[sizeof(QEMUD_SAVE_MAGIC)]; + virDomainPtr dom; + struct qemud_vm *vm; + int fd; + + /* Find the UUID */ + if ((fd = open(path, O_RDONLY)) < 0) { + qemudReportError(conn, NULL, NULL, VIR_ERR_OPERATION_FAILED, + "cannot read domain image"); + return -1; + } + if (lseek(fd, 0 - (sizeof(magic) + sizeof(uuid)), SEEK_END) == -1) { + qemudReportError(conn, NULL, NULL, VIR_ERR_OPERATION_FAILED, + "failed to seek to end of domain image"); + close(fd); + return -1; + } + if (read(fd, magic, sizeof(magic)) != sizeof(magic)) { + qemudReportError(conn, NULL, NULL, VIR_ERR_OPERATION_FAILED, + "failed to read domain image"); + close(fd); + return -1; + } + if (memcmp(magic, QEMUD_SAVE_MAGIC, sizeof(magic))) { + qemudReportError(conn, NULL, NULL, VIR_ERR_OPERATION_FAILED, + "image magic is not correct"); + close(fd); + return -1; + } + if (read(fd, uuid, sizeof(uuid)) != sizeof(uuid)) { + qemudReportError(conn, NULL, NULL, VIR_ERR_OPERATION_FAILED, + "failed to read domain UUID from image"); + close(fd); + return -1; + } + close(fd); + + /* Find the domain & vm */ + dom = qemudDomainLookupByUUID(conn, uuid); + if (!dom) { + qemudReportError(conn, NULL, NULL, VIR_ERR_INVALID_DOMAIN, + "no domain with matching uuid"); + return -1; + } + + vm = qemudFindVMByUUID(driver, dom->uuid); + if (!vm) { + qemudReportError(conn, dom, NULL, VIR_ERR_INVALID_DOMAIN, + "no vm with matching uuid"); + return -1; + } + + /* Set the migration source and start it up. */ + snprintf(vm->migrateFrom, sizeof(vm->migrateFrom), "file://%s", path); + vm->migrateFrom[sizeof(vm->migrateFrom)-1] = '\0'; + + return qemudStartVMDaemon(dom->conn, driver, vm); } @@ -1931,6 +2053,7 @@ return -1; } + vm->migrateFrom[0] = '\0'; return qemudStartVMDaemon(dom->conn, driver, vm); }

On Thu, Aug 09, 2007 at 05:26:43PM -0400, Jim Paris wrote:
Hi,
I've implemented save/restore support for KVM using the live migration feature. First, a few patches to KVM to fix bugs:
Subject: [PATCH 1/3] qemu: fix freed pointer dereference http://article.gmane.org/gmane.comp.emulators.kvm.devel/5572
Subject: [PATCH 2/3] qemu: don't start a new migration if one is already in progress http://article.gmane.org/gmane.comp.emulators.kvm.devel/5575
Subject: [PATCH 3/3] qemu: reset buffer pointers after CR/LF http://article.gmane.org/gmane.comp.emulators.kvm.devel/5573
(If compatibility with old KVM is wanted, it might be possible to work around the kvm bugs in other ways, but I'm not sure).
These are all genuine bugs so its not unreasonable to require that anyone using save/restore ensure they're patched.
Then, another patch to KVM to support inbound migration from a filename. It already supports migration from stdin, but adding this seemed easier from a libvirt perspective.
Subject: [PATCH] qemu: accept filename for incoming migration http://article.gmane.org/gmane.comp.emulators.kvm.devel/5590
Just been committed to KVM repos I see. Should be an easy patch to backport too. As long as we can detect failure if this is missing & report it back then I'm fine depending on this.
Finally, the libvirt side. Some notes:
- Arbitrary decisions about VM status: I pause the VM before starting migration, and destroy it once it's finished. Neither is required by KVM but I'd be concerned about the disk image state otherwise. Also, after resuming, the VM is still paused. I don't know how Xen behaves in this regard but the behavior here is easy enough to change.
Xen doesn't mind whether the VM is running or paused when saving it. Pausing it seems reasonable - its going to be stopped shortly anyway, so letting it continue running just increases the saved image size by having to process dirtied memory. As for restore, I'd be inclined to leave it running after restore to match our Xen driver.
- I append the domain's UUID at the end of the migration image. This doesn't affect KVM at all (it ignores the extra data). Does that seem reasonable? It's unclear how the saved image is supposed to get associated with a particular VM configuration without doing something like this.
Actually I'd store the entire XML config appended to the end of the image. Its quite possible the saved image may be restored on a different machine so libvirt will need the XML config there & its not much work to automatically append it all & use it when restoring later.
- I added the migrateFrom field to the qemu_vm struct. Dunno if that's the best place. Could also add put it in qemu_vm_def, or add parameters to qemudStartVMDaemon, etc..
Seems reasonable place for now - its not really permanent data so the main 'qemu_vm' struct is more appropriate than 'qemu_vm_def'. Regards, Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|

Daniel P. Berrange wrote:
- I append the domain's UUID at the end of the migration image. This doesn't affect KVM at all (it ignores the extra data). Does that seem reasonable? It's unclear how the saved image is supposed to get associated with a particular VM configuration without doing something like this.
Actually I'd store the entire XML config appended to the end of the image. Its quite possible the saved image may be restored on a different machine so libvirt will need the XML config there & its not much work to automatically append it all & use it when restoring later.
How does this (or even _does_ this) work in the Xen case? Does Xen save this info like UUID, name, etc. in the migration image? Rich. -- Emerging Technologies, Red Hat - http://et.redhat.com/~rjones/ Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SL4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 03798903

On Fri, Aug 10, 2007 at 11:06:02AM +0100, Richard W.M. Jones wrote:
Daniel P. Berrange wrote:
- I append the domain's UUID at the end of the migration image. This doesn't affect KVM at all (it ignores the extra data). Does that seem reasonable? It's unclear how the saved image is supposed to get associated with a particular VM configuration without doing something like this.
Actually I'd store the entire XML config appended to the end of the image. Its quite possible the saved image may be restored on a different machine so libvirt will need the XML config there & its not much work to automatically append it all & use it when restoring later.
How does this (or even _does_ this) work in the Xen case? Does Xen save this info like UUID, name, etc. in the migration image?
XenD stuffs the entire SEXPR config for the guest in the start of the saved image file & reads it out when restoring. So my suggestion is basically the same Regards, Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|

On Thu, Aug 09, 2007 at 10:55:10PM +0100, Daniel P. Berrange wrote:
On Thu, Aug 09, 2007 at 05:26:43PM -0400, Jim Paris wrote:
Hi,
I've implemented save/restore support for KVM using the live migration feature. First, a few patches to KVM to fix bugs:
Very cool :-)
Then, another patch to KVM to support inbound migration from a filename. It already supports migration from stdin, but adding this seemed easier from a libvirt perspective.
Subject: [PATCH] qemu: accept filename for incoming migration http://article.gmane.org/gmane.comp.emulators.kvm.devel/5590
Just been committed to KVM repos I see. Should be an easy patch to backport too. As long as we can detect failure if this is missing & report it back then I'm fine depending on this.
Would checking for the kvm version from the console sufficient ? Since KVM makes even more releases than libvirt in average I guess that would be fine.
Finally, the libvirt side. Some notes:
- Arbitrary decisions about VM status: I pause the VM before starting migration, and destroy it once it's finished. Neither is required by KVM but I'd be concerned about the disk image state otherwise. Also, after resuming, the VM is still paused. I don't know how Xen behaves in this regard but the behavior here is easy enough to change.
Xen doesn't mind whether the VM is running or paused when saving it. Pausing it seems reasonable - its going to be stopped shortly anyway, so letting it continue running just increases the saved image size by having to process dirtied memory. As for restore, I'd be inclined to leave it running after restore to match our Xen driver.
I must admit I feel more comfortable with pausing, reduces the complexity of the operation and could avoid nasty bugs near impossible to track.
- I append the domain's UUID at the end of the migration image. This doesn't affect KVM at all (it ignores the extra data). Does that seem reasonable? It's unclear how the saved image is supposed to get associated with a particular VM configuration without doing something like this.
Actually I'd store the entire XML config appended to the end of the image. Its quite possible the saved image may be restored on a different machine so libvirt will need the XML config there & its not much work to automatically append it all & use it when restoring later.
+1 . The only problem is that the XML has no predefined size, so it may be hard to stack more stuff behind it. I would ask first on the KVM list to check if it's okay to add a variable lenght data structure at the end, they might want to extend it in the future and that would be hard to handle. Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

Daniel Veillard wrote:
On Thu, Aug 09, 2007 at 10:55:10PM +0100, Daniel P. Berrange wrote:
Just been committed to KVM repos I see. Should be an easy patch to backport too. As long as we can detect failure if this is missing & report it back then I'm fine depending on this.
Would checking for the kvm version from the console sufficient ? Since KVM makes even more releases than libvirt in average I guess that would be fine.
I'm not sure the kvm qemu binary even reports the kvm version anywhere. I'll ask on kvm-devel to see if qemu/VERSION could get updated with each KVM release.
- I append the domain's UUID at the end of the migration image. This doesn't affect KVM at all (it ignores the extra data). Does that seem reasonable? It's unclear how the saved image is supposed to get associated with a particular VM configuration without doing something like this.
Actually I'd store the entire XML config appended to the end of the image. Its quite possible the saved image may be restored on a different machine so libvirt will need the XML config there & its not much work to automatically append it all & use it when restoring later.
+1 . The only problem is that the XML has no predefined size, so it may be hard to stack more stuff behind it. I would ask first on the KVM list to check if it's okay to add a variable lenght data structure at the end, they might want to extend it in the future and that would be hard to handle.
I think appending unrelated data to the migration image is a bit of a hack anyway. A better plan would be a file containing <header> <XML config> <migration data> On save, libvirt writes <header> and <XML config>, then closes it and uses "dd of=path oflag=append conv=notrunc" or just "cat >> path" as the migration command. On restore, libvirt reads the header and XML config, and then feeds the remaining migration data to KVM using "-incoming stdio". I had wanted to avoid the trouble of feeding data via stdin, but maybe a well placed dup2(fd,STDIN_FILENO) would do the trick automatically. This file format would also make it easier for e.g. virt-manager to determine that a file is a valid libvirt restore image. -jim

On Fri, Aug 10, 2007 at 12:36:05PM -0400, Jim Paris wrote:
Daniel Veillard wrote:
On Thu, Aug 09, 2007 at 10:55:10PM +0100, Daniel P. Berrange wrote:
Just been committed to KVM repos I see. Should be an easy patch to backport too. As long as we can detect failure if this is missing & report it back then I'm fine depending on this.
Would checking for the kvm version from the console sufficient ? Since KVM makes even more releases than libvirt in average I guess that would be fine.
I'm not sure the kvm qemu binary even reports the kvm version anywhere. I'll ask on kvm-devel to see if qemu/VERSION could get updated with each KVM release.
- I append the domain's UUID at the end of the migration image. This doesn't affect KVM at all (it ignores the extra data). Does that seem reasonable? It's unclear how the saved image is supposed to get associated with a particular VM configuration without doing something like this.
Actually I'd store the entire XML config appended to the end of the image. Its quite possible the saved image may be restored on a different machine so libvirt will need the XML config there & its not much work to automatically append it all & use it when restoring later.
+1 . The only problem is that the XML has no predefined size, so it may be hard to stack more stuff behind it. I would ask first on the KVM list to check if it's okay to add a variable lenght data structure at the end, they might want to extend it in the future and that would be hard to handle.
I think appending unrelated data to the migration image is a bit of a hack anyway. A better plan would be a file containing <header> <XML config> <migration data>
On save, libvirt writes <header> and <XML config>, then closes it and uses "dd of=path oflag=append conv=notrunc" or just "cat >> path" as the migration command.
On restore, libvirt reads the header and XML config, and then feeds the remaining migration data to KVM using "-incoming stdio". I had wanted to avoid the trouble of feeding data via stdin, but maybe a well placed dup2(fd,STDIN_FILENO) would do the trick automatically.
That sounds like a reasonable plan
This file format would also make it easier for e.g. virt-manager to determine that a file is a valid libvirt restore image.
A little magic in the first couple of bytes would be handy - thats what we check currently for Xen. Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|

On Fri, Aug 10, 2007 at 12:36:05PM -0400, Jim Paris wrote:
if it's okay to add a variable lenght data structure at the end, they might want to extend it in the future and that would be hard to handle.
I think appending unrelated data to the migration image is a bit of a hack anyway. A better plan would be a file containing <header> <XML config> <migration data>
Hum, beware you then need to make sure that you indicate in the header the lenght of the XML config, because you can't ask the XML parser to tell you when the XML file ends. Should not be hard but it's a pitfall I have seen too many people trip on :-)
On save, libvirt writes <header> and <XML config>, then closes it and uses "dd of=path oflag=append conv=notrunc" or just "cat >> path" as the migration command.
On restore, libvirt reads the header and XML config, and then feeds the remaining migration data to KVM using "-incoming stdio". I had wanted to avoid the trouble of feeding data via stdin, but maybe a well placed dup2(fd,STDIN_FILENO) would do the trick automatically.
This file format would also make it easier for e.g. virt-manager to determine that a file is a valid libvirt restore image.
Or just to be able to get informations from an image without having to scan in the restore image to find the index etc. Putting (some) metadata first is usually a good thing when designing a compound format. I feel way better with such a proposal. Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

Daniel Veillard wrote:
On Fri, Aug 10, 2007 at 12:36:05PM -0400, Jim Paris wrote:
I think appending unrelated data to the migration image is a bit of a hack anyway. A better plan would be a file containing <header> <XML config> <migration data>
Hum, beware you then need to make sure that you indicate in the header the lenght of the XML config, because you can't ask the XML parser to tell you when the XML file ends. Should not be hard but it's a pitfall I have seen too many people trip on :-)
Remembering to null-terminate the XML config before passing it to qemudParseVMDef is also helpful :) I have it mostly working now. Saving works fine, but restoring is a bit buggy with regards to multiple copies of the same domain, config files, etc. This is what I'm using for resuming: xml = read XML def = qemudParseVMDef(xml) vm = qemudAssignVMDef(def) if (qemudStartVMDaemon(vm) < 0) qemudRemoveInactiveVM(vm) My questions: Should I add qemudSaveVMDef somewhere? Currently, if I undefine a domain and then restore from an image, I don't think there's any way to then save that domain back to a config file. What should happen if one tries to resume the same image twice? Currently I think qemudAssignVMDef returns a pointer to the first vm, the start fails, and then RemoveInactiveVM removes the running one -- not good. Should I try to detect when the name/UUID already exists and refuse? The "test" driver seems to also be broken in this regard: $ virsh --connect test:///default virsh > save test /tmp/test virsh > restore /tmp/test virsh > shutdown 2 virsh > list --all Id Name State ---------------------------------- - test shut off - test shut off There's no way to uniquely refer to either of those now. Does the xen driver handle this any better? Also, is there a reason qemudAssignVMDef match by name rather than UUID? Are names supposed to be unique? -jim

On Sat, Aug 11, 2007 at 03:16:38PM -0400, Jim Paris wrote:
Daniel Veillard wrote:
On Fri, Aug 10, 2007 at 12:36:05PM -0400, Jim Paris wrote:
I think appending unrelated data to the migration image is a bit of a hack anyway. A better plan would be a file containing <header> <XML config> <migration data>
Hum, beware you then need to make sure that you indicate in the header the lenght of the XML config, because you can't ask the XML parser to tell you when the XML file ends. Should not be hard but it's a pitfall I have seen too many people trip on :-)
Remembering to null-terminate the XML config before passing it to qemudParseVMDef is also helpful :)
I have it mostly working now. Saving works fine, but restoring is a bit buggy with regards to multiple copies of the same domain, config files, etc. This is what I'm using for resuming:
xml = read XML def = qemudParseVMDef(xml) vm = qemudAssignVMDef(def) if (qemudStartVMDaemon(vm) < 0) qemudRemoveInactiveVM(vm)
My questions:
Should I add qemudSaveVMDef somewhere? Currently, if I undefine a domain and then restore from an image, I don't think there's any way to then save that domain back to a config file.
There's no need to call saveVMDef - the current save/restore API is what I term 'unmanaged'. So if the VM already has a persistent config on disk and you restore it, there's no need to save it. If the VM doesn't have a persistent config, then the original domain was 'transient' and the new domain will also be 'transient' so no need to save in this case either.
What should happen if one tries to resume the same image twice?
If we see an existing VM running with matching name or UUID, then it should cause an error. If there is an existing config then it'll simply be overwritten. Now in general restoring twice is only safe if the admin is doing snapshotting of the disks - we've no way of knowing this so just have to assume the admin knows what they're doing if they restore twice (or more).
Currently I think qemudAssignVMDef returns a pointer to the first vm, the start fails, and then RemoveInactiveVM removes the running one -- not good. Should I try to detect when the name/UUID already exists and refuse?
Yes, refuse if there is an active VM with maching name or UUID.
The "test" driver seems to also be broken in this regard: $ virsh --connect test:///default virsh > save test /tmp/test virsh > restore /tmp/test virsh > shutdown 2 virsh > list --all Id Name State ---------------------------------- - test shut off - test shut off
There's no way to uniquely refer to either of those now.
That's a bug in the test driver - i'll sort it out.
Does the xen driver handle this any better?
Not sure.
Also, is there a reason qemudAssignVMDef match by name rather than UUID? Are names supposed to be unique?
Both names & UUIDs should be considered unique. As for why AssignVMDef matches by name I can't really recall any particular reason. Regards, Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|

Hi Daniel, Thanks for the quick reply. Things seem to be working well now, Main changes from last time: - Fixed issues with the QEMU monitor interface: - Due to the TTY layer, sending "\n" to the qemu monitor translates into "\r\n" when received. This triggers a bug in older versions of QEMU (KVM <= 33) because the same command is executed twice, and still has problems with fixed QEMU because the "(qemu)" prompt is printed twice. Switch all monitor commands to end with "\r" which avoids both issues. - The QEMU monitor sends frequent terminal escape sequences, typically \033[D and \033[K. At times, these interfere with the prompt detection when they get sent between "\n" and "(qemu) ". Fix the issue by filtering out these sequences when they are received. - Migration data is fed to QEMU via stdin rather than with a filename. This allows us to feed it just the data it needs. With this and the monitor fixes, I think save/restore should now work with KVM <= 33, as a nice side effect. - Fixed quoting issues. Since the filename is passed to the shell in single quotes, the filename just needs to have ' replaced with '\'' . Also, the QEMU monitor itself requires that we use escape sequences \r, \n, \", and \\ for the corresponding characters. - New save file format, with header, XML, migration stream. Like Xen, the saved image starts with a 16-byte magic, in this case "LibvirtQemudSave". - The paused/running state of the VM before saving is stored in the save file, and the restored VM is put into the same state. I can remove this and just unconditionally resume the machine as mentioned earlier, but being able to restore paused is very useful especially for testing without disk snapshots. resume test save test /tmp/test restore /tmp/test -> starts resumed suspend test save test /tmp/test restore /tmp/test -> starts suspended (As before, the VM is still always paused as part of the migration itself) - Refuses to restore if the name or UUID is already active. Since it's getting a bit large I've split the patches up and will send them as replies to this message. -jim

Signed-off-by: Jim Paris <jim@jtan.com> --- src/qemu_driver.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/src/qemu_driver.c b/src/qemu_driver.c index 7c75d9c..b05c3f6 100644 --- a/src/qemu_driver.c +++ b/src/qemu_driver.c @@ -204,6 +204,7 @@ qemudStartup(void) { qemudShutdown(); qemudAutostartConfigs(qemu_driver); + free(base); return 0; snprintf_error: -- 1.5.3.rc4

On Sun, Aug 12, 2007 at 07:11:33PM -0400, Jim Paris wrote:
Signed-off-by: Jim Paris <jim@jtan.com> --- src/qemu_driver.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/src/qemu_driver.c b/src/qemu_driver.c index 7c75d9c..b05c3f6 100644 --- a/src/qemu_driver.c +++ b/src/qemu_driver.c @@ -204,6 +204,7 @@ qemudStartup(void) { qemudShutdown(); qemudAutostartConfigs(qemu_driver);
+ free(base); return 0;
snprintf_error:
Okay, uncontroversial bug, applied, will commit shortly, thanks ! Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

Due to the TTY layer, sending "\n" to the qemu monitor translates into "\r\n" when received. This triggers a bug in older versions of QEMU (KVM <= 33) because the same command is executed twice, and still has problems with fixed QEMU because the "(qemu)" prompt is printed twice. Switch all monitor commands to end with "\r" which avoids both issues. The QEMU monitor sends frequent terminal escape sequences, typically \033[D and \033[K. At times, these interfere with the prompt detection when they get sent between "\n" and "(qemu) ". Fix the issue by filtering out these sequences when they are received. Signed-off-by: Jim Paris <jim@jtan.com> --- src/qemu_driver.c | 21 ++++++++++++++++----- 1 files changed, 16 insertions(+), 5 deletions(-) diff --git a/src/qemu_driver.c b/src/qemu_driver.c index b05c3f6..8063ad2 100644 --- a/src/qemu_driver.c +++ b/src/qemu_driver.c @@ -1306,12 +1306,14 @@ static int qemudMonitorCommand(struct qemud_driver *driver ATTRIBUTE_UNUSED, for (;;) { struct pollfd fd = { vm->monitor, POLLIN | POLLERR | POLLHUP, 0 }; char *tmp; + int skip = 0; /* Read all the data QEMU has sent thus far */ for (;;) { char data[1024]; int got = read(vm->monitor, data, sizeof(data)); char *b; + int i; if (got == 0) { if (buf) @@ -1333,14 +1335,23 @@ static int qemudMonitorCommand(struct qemud_driver *driver ATTRIBUTE_UNUSED, return -1; } buf = b; - memmove(buf+size, data, got); - buf[size+got] = '\0'; - size += got; + + /* Copy data, skipping 3-byte escape sequences */ + for (i = 0; i < got; i++) { + if (data[i] == '\033') + skip = 3; + if (skip) + skip--; + else + buf[size++] = data[i]; + } + buf[size] = '\0'; } if (buf) qemudDebug("Mon [%s]", buf); /* Look for QEMU prompt to indicate completion */ if (buf && ((tmp = strstr(buf, "\n(qemu) ")) != NULL)) { + fprintf(stderr,"got qemu\n"); tmp[0] = '\0'; break; } @@ -1755,7 +1766,7 @@ static int qemudDomainSuspend(virDomainPtr dom) { if (vm->state == VIR_DOMAIN_PAUSED) return 0; - if (qemudMonitorCommand(driver, vm, "stop\n", &info) < 0) { + if (qemudMonitorCommand(driver, vm, "stop\r", &info) < 0) { qemudReportError(dom->conn, dom, NULL, VIR_ERR_OPERATION_FAILED, "suspend operation failed"); return -1; } @@ -1780,7 +1791,7 @@ static int qemudDomainResume(virDomainPtr dom) { } if (vm->state == VIR_DOMAIN_RUNNING) return 0; - if (qemudMonitorCommand(driver, vm, "cont\n", &info) < 0) { + if (qemudMonitorCommand(driver, vm, "cont\r", &info) < 0) { qemudReportError(dom->conn, dom, NULL, VIR_ERR_OPERATION_FAILED, "resume operation failed"); return -1; } -- 1.5.3.rc4

On Sun, Aug 12, 2007 at 07:11:34PM -0400, Jim Paris wrote:
Due to the TTY layer, sending "\n" to the qemu monitor translates into "\r\n" when received. This triggers a bug in older versions of QEMU (KVM <= 33) because the same command is executed twice, and still has problems with fixed QEMU because the "(qemu)" prompt is printed twice. Switch all monitor commands to end with "\r" which avoids both issues.
The QEMU monitor sends frequent terminal escape sequences, typically \033[D and \033[K. At times, these interfere with the prompt detection when they get sent between "\n" and "(qemu) ". Fix the issue by filtering out these sequences when they are received.
I think DanP can better comment on the QEmu interaction than me, but the patch looks simple and clean except:
@@ -1333,14 +1335,23 @@ static int qemudMonitorCommand(struct qemud_driver *driver ATTRIBUTE_UNUSED, return -1; } buf = b; - memmove(buf+size, data, got); - buf[size+got] = '\0'; - size += got; + + /* Copy data, skipping 3-byte escape sequences */ + for (i = 0; i < got; i++) { + if (data[i] == '\033') + skip = 3; + if (skip) + skip--; + else + buf[size++] = data[i]; + } + buf[size] = '\0'; }
It seems that if for some reason you do a partial read on the QEmu console descriptor ending in the middle of the escape command you may have a problem. But it's possible that the way reads are done, and input is chunked garantees that this won't happen, DanP can probably confirm it's just fine :-) Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

On Mon, Aug 13, 2007 at 05:59:15AM -0400, Daniel Veillard wrote:
On Sun, Aug 12, 2007 at 07:11:34PM -0400, Jim Paris wrote:
Due to the TTY layer, sending "\n" to the qemu monitor translates into "\r\n" when received. This triggers a bug in older versions of QEMU (KVM <= 33) because the same command is executed twice, and still has problems with fixed QEMU because the "(qemu)" prompt is printed twice. Switch all monitor commands to end with "\r" which avoids both issues.
The QEMU monitor sends frequent terminal escape sequences, typically \033[D and \033[K. At times, these interfere with the prompt detection when they get sent between "\n" and "(qemu) ". Fix the issue by filtering out these sequences when they are received.
I think DanP can better comment on the QEmu interaction than me, but the patch looks simple and clean except:
It looks sane to me - I had no idea QEMU was sending this escape sequences.
@@ -1333,14 +1335,23 @@ static int qemudMonitorCommand(struct qemud_driver *driver ATTRIBUTE_UNUSED, return -1; } buf = b; - memmove(buf+size, data, got); - buf[size+got] = '\0'; - size += got; + + /* Copy data, skipping 3-byte escape sequences */ + for (i = 0; i < got; i++) { + if (data[i] == '\033') + skip = 3; + if (skip) + skip--; + else + buf[size++] = data[i]; + } + buf[size] = '\0'; }
It seems that if for some reason you do a partial read on the QEmu console descriptor ending in the middle of the escape command you may have a problem. But it's possible that the way reads are done, and input is chunked garantees that this won't happen, DanP can probably confirm it's just fine :-)
We're reading from a Psuedo-TTY which is line buffered, so I think the OS should guarentee that we can read a whole lines worth of data without getting EAGAIN. Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|

On Mon, Aug 13, 2007 at 12:54:05PM +0100, Daniel P. Berrange wrote:
On Mon, Aug 13, 2007 at 05:59:15AM -0400, Daniel Veillard wrote:
On Sun, Aug 12, 2007 at 07:11:34PM -0400, Jim Paris wrote:
Due to the TTY layer, sending "\n" to the qemu monitor translates into "\r\n" when received. This triggers a bug in older versions of QEMU (KVM <= 33) because the same command is executed twice, and still has problems with fixed QEMU because the "(qemu)" prompt is printed twice. Switch all monitor commands to end with "\r" which avoids both issues.
The QEMU monitor sends frequent terminal escape sequences, typically \033[D and \033[K. At times, these interfere with the prompt detection when they get sent between "\n" and "(qemu) ". Fix the issue by filtering out these sequences when they are received.
I think DanP can better comment on the QEmu interaction than me, but the patch looks simple and clean except:
It looks sane to me - I had no idea QEMU was sending this escape sequences.
@@ -1333,14 +1335,23 @@ static int qemudMonitorCommand(struct qemud_driver *driver ATTRIBUTE_UNUSED, return -1; } buf = b; - memmove(buf+size, data, got); - buf[size+got] = '\0'; - size += got; + + /* Copy data, skipping 3-byte escape sequences */ + for (i = 0; i < got; i++) { + if (data[i] == '\033') + skip = 3; + if (skip) + skip--; + else + buf[size++] = data[i]; + } + buf[size] = '\0'; }
It seems that if for some reason you do a partial read on the QEmu console descriptor ending in the middle of the escape command you may have a problem. But it's possible that the way reads are done, and input is chunked garantees that this won't happen, DanP can probably confirm it's just fine :-)
We're reading from a Psuedo-TTY which is line buffered, so I think the OS should guarentee that we can read a whole lines worth of data without getting EAGAIN.
+1 then Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

Daniel Veillard wrote:
+ /* Copy data, skipping 3-byte escape sequences */ + for (i = 0; i < got; i++) { + if (data[i] == '\033') + skip = 3; + if (skip) + skip--; + else + buf[size++] = data[i]; + } + buf[size] = '\0'; }
It seems that if for some reason you do a partial read on the QEmu console descriptor ending in the middle of the escape command you may have a problem.
It should be OK. Partial reads are why I'm setting using the "skip" variable which is persistent across read() calls. Any time we see '\033' we'll always skip three bytes from qemu. Note that partial reads across qemuMonitorCommand calls doesn't really matter, because we really just care about finding the next prompt anyway, and there shouldn't be any data received between the prompt and the execution of the next command. Daniel P. Berrange wrote:
We're reading from a Psuedo-TTY which is line buffered, so I think the OS should guarentee that we can read a whole lines worth of data without getting EAGAIN.
QEMU disables line buffering when it initializes the pty.
It looks sane to me - I had no idea QEMU was sending this escape sequences.
It comes from qemu's readline.c:term_update and is a bit of a pain. ... Actually, on closer inspection, I think this patch might be misguided. The only case where you should get escape sequences after the "\n" but before the "(qemu)" is when you are sending CRLF to a version of KVM that's supposed to be better at handling it -- it turns out my kvm patch was incomplete and didn't reset all of the input state. When monitor commands are terminated with "\r" rather than "\n", this should never occur. And so filtering escape sequences should be unnecessary, as they should only show up on the echoed command line. There were also some bugs in my libvirt patches (a merge error left two commands terminated with "\n", and I left some debug output). I'll fix things up and send an updated series. -jim

If nonzero, uses the supplied fd instead of /dev/null. Update callers accordingly. Signed-off-by: Jim Paris <jim@jtan.com> --- src/openvz_driver.c | 4 ++-- src/qemu_driver.c | 5 +++-- src/util.c | 12 ++++++------ src/util.h | 4 ++-- 4 files changed, 13 insertions(+), 12 deletions(-) diff --git a/src/openvz_driver.c b/src/openvz_driver.c index 84d514c..b0788f6 100644 --- a/src/openvz_driver.c +++ b/src/openvz_driver.c @@ -342,7 +342,7 @@ static int openvzListDomains(virConnectPtr conn, int *ids, int nids) { char buf[32]; const char *cmd[] = {VZLIST, "-ovpsid", "-H" , NULL}; - ret = virExec(conn, (char **)cmd, &pid, &outfd, &errfd); + ret = virExec(conn, (char **)cmd, &pid, 0, &outfd, &errfd); if(ret == -1) { error(conn, VIR_ERR_INTERNAL_ERROR, "Could not exec " VZLIST); return (int)NULL; @@ -373,7 +373,7 @@ static int openvzListDefinedDomains(virConnectPtr conn, const char *cmd[] = {VZLIST, "-ovpsid", "-H", NULL}; /* the -S options lists only stopped domains */ - ret = virExec(conn, (char **)cmd, &pid, &outfd, &errfd); + ret = virExec(conn, (char **)cmd, &pid, 0, &outfd, &errfd); if(ret == -1) { error(conn, VIR_ERR_INTERNAL_ERROR, "Could not exec " VZLIST); return (int)NULL; diff --git a/src/qemu_driver.c b/src/qemu_driver.c index 8063ad2..553aa21 100644 --- a/src/qemu_driver.c +++ b/src/qemu_driver.c @@ -655,7 +655,8 @@ static int qemudStartVMDaemon(virConnectPtr conn, qemudLog(QEMUD_WARN, "Unable to write argv to logfile %d: %s", errno, strerror(errno)); - if (virExecNonBlock(conn, argv, &vm->pid, &vm->stdout, &vm->stderr) == 0) { + if (virExecNonBlock(conn, argv, &vm->pid, + 0, &vm->stdout, &vm->stderr) == 0) { vm->id = driver->nextvmid++; vm->state = VIR_DOMAIN_RUNNING; @@ -912,7 +913,7 @@ dhcpStartDhcpDaemon(virConnectPtr conn, if (qemudBuildDnsmasqArgv(conn, network, &argv) < 0) return -1; - ret = virExecNonBlock(conn, argv, &network->dnsmasqPid, NULL, NULL); + ret = virExecNonBlock(conn, argv, &network->dnsmasqPid, 0, NULL, NULL); for (i = 0; argv[i]; i++) free(argv[i]); diff --git a/src/util.c b/src/util.c index f53cfd2..546a7b8 100644 --- a/src/util.c +++ b/src/util.c @@ -79,7 +79,7 @@ static int virSetNonBlock(int fd) { static int _virExec(virConnectPtr conn, char **argv, - int *retpid, int *outfd, int *errfd, int non_block) { + int *retpid, int infd, int *outfd, int *errfd, int non_block) { int pid, null; int pipeout[2] = {-1,-1}; int pipeerr[2] = {-1,-1}; @@ -140,7 +140,7 @@ _virExec(virConnectPtr conn, if (pipeerr[0] > 0 && close(pipeerr[0]) < 0) _exit(1); - if (dup2(null, STDIN_FILENO) < 0) + if (dup2(infd > 0 ? infd : null, STDIN_FILENO) < 0) _exit(1); if (dup2(pipeout[1] > 0 ? pipeout[1] : null, STDOUT_FILENO) < 0) _exit(1); @@ -176,16 +176,16 @@ _virExec(virConnectPtr conn, int virExec(virConnectPtr conn, char **argv, - int *retpid, int *outfd, int *errfd) { + int *retpid, int infd, int *outfd, int *errfd) { - return(_virExec(conn, argv, retpid, outfd, errfd, 0)); + return(_virExec(conn, argv, retpid, infd, outfd, errfd, 0)); } int virExecNonBlock(virConnectPtr conn, char **argv, - int *retpid, int *outfd, int *errfd) { + int *retpid, int infd, int *outfd, int *errfd) { - return(_virExec(conn, argv, retpid, outfd, errfd, 1)); + return(_virExec(conn, argv, retpid, infd, outfd, errfd, 1)); } diff --git a/src/util.h b/src/util.h index 5b84043..d11e6d9 100644 --- a/src/util.h +++ b/src/util.h @@ -21,6 +21,6 @@ * File created Jul 18, 2007 - Shuveb Hussain <shuveb@binarykarma.com> */ -int virExec(virConnectPtr conn, char **argv, int *retpid, int *outfd, int *errfd); -int virExecNonBlock(virConnectPtr conn, char **argv, int *retpid, int *outfd, int *errfd); +int virExec(virConnectPtr conn, char **argv, int *retpid, int infd, int *outfd, int *errfd); +int virExecNonBlock(virConnectPtr conn, char **argv, int *retpid, int infd, int *outfd, int *errfd); -- 1.5.3.rc4

On Sun, Aug 12, 2007 at 07:11:35PM -0400, Jim Paris wrote:
If nonzero, uses the supplied fd instead of /dev/null. Update callers accordingly.
Looks fine to me, we already have stdout and stderr, it's sensible. +1 Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

On Mon, Aug 13, 2007 at 06:01:42AM -0400, Daniel Veillard wrote:
On Sun, Aug 12, 2007 at 07:11:35PM -0400, Jim Paris wrote:
If nonzero, uses the supplied fd instead of /dev/null. Update callers accordingly.
Looks fine to me, we already have stdout and stderr, it's sensible. +1
Looks OK, but shouldn't we use -1 as the magic value here, since 0 is also a valid file descriptor. Regards, Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|

On Mon, Aug 13, 2007 at 12:55:01PM +0100, Daniel P. Berrange wrote:
On Mon, Aug 13, 2007 at 06:01:42AM -0400, Daniel Veillard wrote:
On Sun, Aug 12, 2007 at 07:11:35PM -0400, Jim Paris wrote:
If nonzero, uses the supplied fd instead of /dev/null. Update callers accordingly.
Looks fine to me, we already have stdout and stderr, it's sensible. +1
Looks OK, but shouldn't we use -1 as the magic value here, since 0 is also a valid file descriptor.
ohh, right, good catch ! +1 with that change, Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

Daniel P. Berrange wrote:
If nonzero, uses the supplied fd instead of /dev/null. Update callers accordingly.
Looks fine to me, we already have stdout and stderr, it's sensible. +1
Looks OK, but shouldn't we use -1 as the magic value here, since 0 is also a valid file descriptor.
Yeah, that was me taking advantage of the fact that the qemu_vm structure is initialized to 0. I'll change that to -1 and make sure all qemu_vm users initialize stdinFd correctly. -jim

Adds new fields in qemu_vm structure. vm->migrateFrom specifies the argument to "-incoming". vm->stdinFd specifies the file descriptor to pass to virExec as stdin, which will be used for the "-incoming stdio" case. Signed-off-by: Jim Paris <jim@jtan.com> --- src/qemu_conf.c | 12 +++++++++++- src/qemu_conf.h | 2 ++ src/qemu_driver.c | 4 ++-- 3 files changed, 15 insertions(+), 3 deletions(-) diff --git a/src/qemu_conf.c b/src/qemu_conf.c index 79dd180..f02d693 100644 --- a/src/qemu_conf.c +++ b/src/qemu_conf.c @@ -1518,7 +1518,8 @@ int qemudBuildCommandLine(virConnectPtr conn, (vm->def->os.initrd[0] ? 2 : 0) + /* initrd */ (vm->def->os.cmdline[0] ? 2 : 0) + /* cmdline */ (vm->def->graphicsType == QEMUD_GRAPHICS_VNC ? 2 : - (vm->def->graphicsType == QEMUD_GRAPHICS_SDL ? 0 : 1)); /* graphics */ + (vm->def->graphicsType == QEMUD_GRAPHICS_SDL ? 0 : 1)) + /* graphics */ + (vm->migrateFrom[0] ? 3 : 0); /* migrateFrom */ snprintf(memory, sizeof(memory), "%d", vm->def->memory/1024); snprintf(vcpus, sizeof(vcpus), "%d", vm->def->vcpus); @@ -1767,6 +1768,15 @@ int qemudBuildCommandLine(virConnectPtr conn, /* SDL is the default. no args needed */ } + if (vm->migrateFrom[0]) { + if (!((*argv)[++n] = strdup("-S"))) + goto no_memory; + if (!((*argv)[++n] = strdup("-incoming"))) + goto no_memory; + if (!((*argv)[++n] = strdup(vm->migrateFrom))) + goto no_memory; + } + (*argv)[++n] = NULL; return 0; diff --git a/src/qemu_conf.h b/src/qemu_conf.h index 60a38b7..ba61264 100644 --- a/src/qemu_conf.h +++ b/src/qemu_conf.h @@ -212,6 +212,8 @@ struct qemud_vm { char configFile[PATH_MAX]; char autostartLink[PATH_MAX]; + char migrateFrom[PATH_MAX]; + int stdinFd; struct qemud_vm_def *def; /* The current definition */ struct qemud_vm_def *newDef; /* New definition to activate at shutdown */ diff --git a/src/qemu_driver.c b/src/qemu_driver.c index 553aa21..e487640 100644 --- a/src/qemu_driver.c +++ b/src/qemu_driver.c @@ -656,9 +656,9 @@ static int qemudStartVMDaemon(virConnectPtr conn, errno, strerror(errno)); if (virExecNonBlock(conn, argv, &vm->pid, - 0, &vm->stdout, &vm->stderr) == 0) { + vm->stdinFd, &vm->stdout, &vm->stderr) == 0) { vm->id = driver->nextvmid++; - vm->state = VIR_DOMAIN_RUNNING; + vm->state = vm->migrateFrom[0] ? VIR_DOMAIN_PAUSED : VIR_DOMAIN_RUNNING; driver->ninactivevms--; driver->nactivevms++; -- 1.5.3.rc4

Use this to escape a shell argument in a commandline passed to qemu. First we need to escape certain characters to get them through the qemu monitor interface. On the shell side, the argument will be enclosed in single quotes, so the only character that needs special treatment is the single quote itself. Signed-off-by: Jim Paris <jim@jtan.com> --- src/qemu_driver.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 66 insertions(+), 0 deletions(-) diff --git a/src/qemu_driver.c b/src/qemu_driver.c index e487640..5d310fe 100644 --- a/src/qemu_driver.c +++ b/src/qemu_driver.c @@ -1866,6 +1866,72 @@ static int qemudDomainGetInfo(virDomainPtr dom, } +static char *qemudEscapeShellArg(const char *in) +{ + int len = 0; + int i, j; + char *out; + + /* To pass through the QEMU monitor, we need to use escape + sequences: \r, \n, \", \\ + + To pass through both QEMU + the shell, we need to escape + the single character ' as the five characters '\\'' + */ + + for (i = 0; in[i] != '\0'; i++) { + switch(in[i]) { + case '\r': + case '\n': + case '"': + case '\\': + len += 2; + break; + case '\'': + len += 5; + break; + default: + len += 1; + break; + } + } + + if ((out = (char *)malloc(len + 1)) == NULL) + return NULL; + + for (i = j = 0; in[i] != '\0'; i++) { + switch(in[i]) { + case '\r': + out[j++] = '\\'; + out[j++] = 'r'; + break; + case '\n': + out[j++] = '\\'; + out[j++] = 'n'; + break; + case '"': + case '\\': + out[j++] = '\\'; + out[j++] = in[i]; + break; + case '\'': + out[j++] = '\''; + out[j++] = '\\'; + out[j++] = '\\'; + out[j++] = '\''; + out[j++] = '\''; + break; + default: + out[j++] = in[i]; + break; + } + } + out[j] = '\0'; + + return out; +} + + static int qemudDomainSave(virDomainPtr dom, const char *path ATTRIBUTE_UNUSED) { struct qemud_driver *driver = (struct qemud_driver *)dom->conn->privateData; -- 1.5.3.rc4

The save file format consists of a header, XML for the domain, and the raw QEMU/KVM migration data stream. Signed-off-by: Jim Paris <jim@jtan.com> --- src/qemu_driver.c | 109 ++++++++++++++++++++++++++++++++++++++++++++++++++--- 1 files changed, 103 insertions(+), 6 deletions(-) diff --git a/src/qemu_driver.c b/src/qemu_driver.c index 5d310fe..50ab702 100644 --- a/src/qemu_driver.c +++ b/src/qemu_driver.c @@ -1932,20 +1932,117 @@ static char *qemudEscapeShellArg(const char *in) } -static int qemudDomainSave(virDomainPtr dom, - const char *path ATTRIBUTE_UNUSED) { +#define QEMUD_SAVE_MAGIC "LibvirtQemudSave" +struct qemud_save_header { + char magic[sizeof(QEMUD_SAVE_MAGIC)-1]; + int xml_len; + int was_running; +}; + +static int qemudDomainSave(virDomainPtr dom, + const char *path) { struct qemud_driver *driver = (struct qemud_driver *)dom->conn->privateData; struct qemud_vm *vm = qemudFindVMByID(driver, dom->id); + char *command, *info; + int fd; + char *safe_path; + char *xml; + struct qemud_save_header header; + + memset(&header, 0, sizeof(header)); + memcpy(header.magic, QEMUD_SAVE_MAGIC, sizeof(header.magic)); + if (!vm) { - qemudReportError(dom->conn, dom, NULL, VIR_ERR_INVALID_DOMAIN, "no domain with matching id %d", dom->id); + qemudReportError(dom->conn, dom, NULL, VIR_ERR_INVALID_DOMAIN, + "no domain with matching id %d", dom->id); return -1; } + if (!qemudIsActiveVM(vm)) { - qemudReportError(dom->conn, dom, NULL, VIR_ERR_OPERATION_FAILED, "domain is not running"); + qemudReportError(dom->conn, dom, NULL, VIR_ERR_OPERATION_FAILED, + "domain is not running"); return -1; } - qemudReportError(dom->conn, dom, NULL, VIR_ERR_OPERATION_FAILED, "save is not supported"); - return -1; + + /* Pause */ + if (vm->state == VIR_DOMAIN_RUNNING) { + header.was_running = 1; + if (qemudDomainSuspend(dom) != 0) { + qemudReportError(dom->conn, dom, NULL, VIR_ERR_OPERATION_FAILED, + "failed to pause domain"); + return -1; + } + } + + /* Get XML for the domain */ + xml = qemudGenerateXML(dom->conn, driver, vm, vm->def, 0); + if (!xml) { + qemudReportError(dom->conn, dom, NULL, VIR_ERR_OPERATION_FAILED, + "failed to get domain xml"); + return -1; + } + header.xml_len = strlen(xml); + + /* Write header to file, followed by XML */ + if ((fd = open(path, O_CREAT|O_TRUNC|O_WRONLY, S_IRUSR|S_IWUSR)) < 0) { + qemudReportError(dom->conn, dom, NULL, VIR_ERR_OPERATION_FAILED, + "failed to create '%s'", path); + free(xml); + return -1; + } + + if (write(fd, &header, sizeof(header)) != sizeof(header)) { + qemudReportError(dom->conn, dom, NULL, VIR_ERR_OPERATION_FAILED, + "failed to write save header"); + close(fd); + free(xml); + return -1; + } + + if (write(fd, xml, header.xml_len) != header.xml_len) { + qemudReportError(dom->conn, dom, NULL, VIR_ERR_OPERATION_FAILED, + "failed to write xml"); + close(fd); + free(xml); + return -1; + } + + close(fd); + free(xml); + + /* Migrate to file */ + safe_path = qemudEscapeShellArg(path); + if (!safe_path) { + qemudReportError(dom->conn, dom, NULL, VIR_ERR_OPERATION_FAILED, + "out of memory"); + return -1; + } + if (asprintf (&command, "migrate \"exec:" + "dd of='%s' oflag=append conv=notrunc 2>/dev/null" + "\"\n", safe_path) == -1) { + qemudReportError(dom->conn, dom, NULL, VIR_ERR_OPERATION_FAILED, + "out of memory"); + free(safe_path); + return -1; + } + free(safe_path); + + if (qemudMonitorCommand(driver, vm, command, &info) < 0) { + qemudReportError(dom->conn, dom, NULL, VIR_ERR_OPERATION_FAILED, + "migrate operation failed"); + free(command); + return -1; + } + + free(info); + free(command); + + /* Shut it down */ + qemudShutdownVMDaemon(dom->conn, driver, vm); + if (!vm->configFile[0]) + qemudRemoveInactiveVM(driver, vm); + + return 0; } -- 1.5.3.rc4

On Sun, Aug 12, 2007 at 07:11:38PM -0400, Jim Paris wrote:
The save file format consists of a header, XML for the domain, and the raw QEMU/KVM migration data stream. -static int qemudDomainSave(virDomainPtr dom, - const char *path ATTRIBUTE_UNUSED) { +#define QEMUD_SAVE_MAGIC "LibvirtQemudSave" +struct qemud_save_header { + char magic[sizeof(QEMUD_SAVE_MAGIC)-1];
I suggest to add an "int version" field here to be able to extend the format.
+ int xml_len; + int was_running;
and "int unused[16];" With those 2 we should be able to cope with backward compatibility on saved domain even if we don't know yet what may be needed. At worse it's a few bytes lost in a very big file, at best it's a life saver.
+ memset(&header, 0, sizeof(header)); + memcpy(header.magic, QEMUD_SAVE_MAGIC, sizeof(header.magic));
#define QEMU_SAVE_VERSION 1 somewhere and then header.version = QEMU_SAVE_VERSION;
+ header.xml_len = strlen(xml);
maybe + 1 to account for the trailing 0
+ if (write(fd, &header, sizeof(header)) != sizeof(header)) { + qemudReportError(dom->conn, dom, NULL, VIR_ERR_OPERATION_FAILED, + "failed to write save header"); + close(fd); + free(xml); + return -1; + }
I suggest to define or reuse existing safe write and read routines which handle interruped calls with EAGAIN. Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

Signed-off-by: Jim Paris <jim@jtan.com> --- src/qemu_driver.c | 106 +++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 files changed, 102 insertions(+), 4 deletions(-) diff --git a/src/qemu_driver.c b/src/qemu_driver.c index 50ab702..c6de8a0 100644 --- a/src/qemu_driver.c +++ b/src/qemu_driver.c @@ -2047,10 +2047,108 @@ static int qemudDomainSave(virDomainPtr dom, static int qemudDomainRestore(virConnectPtr conn, - const char *path ATTRIBUTE_UNUSED) { - /*struct qemud_driver *driver = (struct qemud_driver *)conn->privateData;*/ - qemudReportError(conn, NULL, NULL, VIR_ERR_OPERATION_FAILED, "restore is not supported"); - return -1; + const char *path) { + struct qemud_driver *driver = (struct qemud_driver *)conn->privateData; + struct qemud_vm_def *def; + struct qemud_vm *vm; + int fd; + char *xml; + struct qemud_save_header header; + + /* Verify the header and read the XML */ + if ((fd = open(path, O_RDONLY)) < 0) { + qemudReportError(conn, NULL, NULL, VIR_ERR_OPERATION_FAILED, + "cannot read domain image"); + return -1; + } + + if (read(fd, &header, sizeof(header)) != sizeof(header)) { + qemudReportError(conn, NULL, NULL, VIR_ERR_OPERATION_FAILED, + "failed to read qemu header"); + close(fd); + return -1; + } + + if (memcmp(header.magic, QEMUD_SAVE_MAGIC, sizeof(header.magic)) != 0) { + qemudReportError(conn, NULL, NULL, VIR_ERR_OPERATION_FAILED, + "image magic is incorrect"); + close(fd); + return -1; + } + + if ((xml = (char *)malloc(header.xml_len + 1)) == NULL) { + qemudReportError(conn, NULL, NULL, VIR_ERR_OPERATION_FAILED, + "out of memory"); + close(fd); + return -1; + } + + if (read(fd, xml, header.xml_len) != header.xml_len) { + qemudReportError(conn, NULL, NULL, VIR_ERR_OPERATION_FAILED, + "failed to read XML"); + close(fd); + free(xml); + return -1; + } + xml[header.xml_len] = '\0'; + + /* Create a domain from this XML */ + if (!(def = qemudParseVMDef(conn, driver, xml, NULL))) { + qemudReportError(conn, NULL, NULL, VIR_ERR_OPERATION_FAILED, + "failed to parse XML"); + close(fd); + free(xml); + return -1; + } + free(xml); + + /* Ensure the name and UUID don't already exist in an active VM */ + vm = qemudFindVMByUUID(driver, def->uuid); + if (!vm) vm = qemudFindVMByName(driver, def->name); + if (vm && qemudIsActiveVM(vm)) { + qemudReportError(conn, NULL, NULL, VIR_ERR_OPERATION_FAILED, + "domain to restore is already active"); + close(fd); + return -1; + } + + if (!(vm = qemudAssignVMDef(conn, driver, def))) { + qemudReportError(conn, NULL, NULL, VIR_ERR_OPERATION_FAILED, + "failed to assign new VM"); + qemudFreeVMDef(def); + close(fd); + return -1; + } + + /* Set the migration source and start it up. */ + snprintf(vm->migrateFrom, sizeof(vm->migrateFrom), "stdio"); + vm->stdinFd = fd; + + if (qemudStartVMDaemon(conn, driver, vm) < 0) { + qemudReportError(conn, NULL, NULL, VIR_ERR_OPERATION_FAILED, + "failed to start VM"); + if (!vm->configFile[0]) + qemudRemoveInactiveVM(driver, vm); + close(fd); + return -1; + } + close(fd); + vm->migrateFrom[0] = '\0'; + vm->stdinFd = 0; + + /* If it was running before, resume it now. */ + if (header.was_running) { + char *info; + if (qemudMonitorCommand(driver, vm, "cont\n", &info) < 0) { + qemudReportError(conn, NULL, NULL, VIR_ERR_OPERATION_FAILED, + "failed to resume domain"); + return -1; + } + free(info); + vm->state = VIR_DOMAIN_RUNNING; + } + + return 0; } -- 1.5.3.rc4

On Sun, Aug 12, 2007 at 07:11:39PM -0400, Jim Paris wrote:
Signed-off-by: Jim Paris <jim@jtan.com>
+ if ((xml = (char *)malloc(header.xml_len + 1)) == NULL) { + qemudReportError(conn, NULL, NULL, VIR_ERR_OPERATION_FAILED, + "out of memory"); + close(fd); + return -1; + } + + if (read(fd, xml, header.xml_len) != header.xml_len) { + qemudReportError(conn, NULL, NULL, VIR_ERR_OPERATION_FAILED, + "failed to read XML"); + close(fd); + free(xml); + return -1; + } + xml[header.xml_len] = '\0';
I would rather save xml_len to include the trailing 0 as part of the save and simplify the + 1. For example it would be legal to have an XML description done in UTF-16, where adding a single trailing 0 byte would not be sufficient, better consider the lenght to be the full data, NUL character included. Oh and wrapping the read() here too, as suggested in previous patch.
+ vm = qemudFindVMByUUID(driver, def->uuid); + if (!vm) vm = qemudFindVMByName(driver, def->name); + if (vm && qemudIsActiveVM(vm)) { + qemudReportError(conn, NULL, NULL, VIR_ERR_OPERATION_FAILED, + "domain to restore is already active");
Would be great to have the name printed here in the error message, allowing to take corrective action or at least some easy analysis thanks, Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

Jim Paris wrote:
+ if (strchr(path, '\'') || strchr(path, '\\') ) { + qemudReportError(dom->conn, dom, NULL, VIR_ERR_OPERATION_FAILED, + "invalid filename"); + return -1; + } [...] + /* Migrate to file. */ + if (asprintf (&command, "migrate \"exec:dd of='%s' 2>/dev/null\"\n", + path) == -1) { + qemudReportError(dom->conn, dom, NULL, VIR_ERR_OPERATION_FAILED, + "out of memory"); + return -1; + }
The patch is fine, except I'm wondering whether the quoting above is safe. We check if the path contains ' or \ and refuse to proceed. I _think_ you don't need to check for \ however, according to this section from the bash manual page and my testing: Enclosing characters in single quotes preserves the literal value of each character within the quotes. A single quote may not occur between single quotes, even when preceded by a backslash. Perhaps it is better to be safe than sorry though. Rich. -- Emerging Technologies, Red Hat - http://et.redhat.com/~rjones/ Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SL4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 03798903

Richard W.M. Jones wrote:
Jim Paris wrote:
+ if (strchr(path, '\'') || strchr(path, '\\') ) { + qemudReportError(dom->conn, dom, NULL, VIR_ERR_OPERATION_FAILED, + "invalid filename"); + return -1; + } [...] + /* Migrate to file. */ + if (asprintf (&command, "migrate \"exec:dd of='%s' 2>/dev/null\"\n", + path) == -1) { + qemudReportError(dom->conn, dom, NULL, VIR_ERR_OPERATION_FAILED, + "out of memory"); + return -1; + }
The patch is fine, except I'm wondering whether the quoting above is safe. We check if the path contains ' or \ and refuse to proceed. I _think_ you don't need to check for \ however
I think you're right. An even better fix would be to explicitly escape bad characters in the path before passing them along. Giving an error on the filename "Jim's VM" as it would do right now isn't ideal. -jim
participants (4)
-
Daniel P. Berrange
-
Daniel Veillard
-
Jim Paris
-
Richard W.M. Jones