[libvirt] [RFC: PATCHv4 00/15] outgoing migration via fd: rather than exec:

v3 has not had complete reviews yet, located at: https://www.redhat.com/archives/libvir-list/2011-March/msg00132.html I'm posting v4 rebased on the latest tree, to get more feedback and possibly ACKs for the earlier half of the series. Differences in v4 (still incomplete): rearrange some of the earlier patches, incorporate Paolo's feedback Currently well-tested through patch 11. Remaining patches (12-15) are compiling but not yet tested, but should form the basis for doing compressed migration. Also available via: git fetch git://repo.or.cz/libvirt/ericb.git fd-migration Still to go - I really hate that doCoreDump can't use fd-migration until I make it compute qemuCaps; really, we should be computing caps once at vm startup, and remembering it elsewhere doCoreDump only works to regular files and fails on root-squash NFS, it should be possible to fix that I really need to write virCommandKill, which silently kills an async child process (if one is running) without overwriting any error messages Needs more testing - especially on root-squash NFS Eric Blake (15): qemu: use lighter-weight fd:n on incoming tunneled migration qemu: consolidate duplicated monitor migration code qemu: improve efficiency of dd during snapshots qemu: support migration to fd util: use SCM_RIGHTS in virFileOperation when needed qemu: allow simple domain save to use fd: protocol qemu: simplify domain save fd handling storage: simplify fd handling util: rename virFileOperation to virFileOpenAs util: adjust indentation in previous patch qemu, storage: improve type safety qemu: use common API for reading difficult files qemu: consolidate migration to file code qemu: skip granting access during fd migration qemu: support fd: migration with compression src/libvirt_private.syms | 2 +- src/qemu/qemu_driver.c | 626 ++++++++++++++++------------------------- src/qemu/qemu_migration.c | 45 +-- src/qemu/qemu_monitor.c | 124 ++++++++- src/qemu/qemu_monitor.h | 4 + src/qemu/qemu_monitor_json.c | 125 +-------- src/qemu/qemu_monitor_json.h | 23 +-- src/qemu/qemu_monitor_text.c | 123 +-------- src/qemu/qemu_monitor_text.h | 23 +-- src/storage/storage_backend.c | 80 +++--- src/util/util.c | 161 ++++++++---- src/util/util.h | 15 +- 12 files changed, 544 insertions(+), 807 deletions(-) -- 1.7.4

Outgoing migration still uses a Unix socket and or exec netcat until the next patch. * src/qemu/qemu_migration.c (qemuMigrationPrepareTunnel): Replace Unix socket with simpler pipe. Suggested by Paolo Bonzini. --- src/qemu/qemu_migration.c | 45 +++++++++++++-------------------------------- 1 files changed, 13 insertions(+), 32 deletions(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 8ce6e17..f450130 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -239,11 +239,10 @@ qemuMigrationPrepareTunnel(struct qemud_driver *driver, { virDomainDefPtr def = NULL; virDomainObjPtr vm = NULL; - char *migrateFrom; virDomainEventPtr event = NULL; int ret = -1; int internalret; - char *unixfile = NULL; + int dataFD[2] = { -1, -1 }; virBitmapPtr qemuCaps = NULL; qemuDomainObjPrivatePtr priv = NULL; struct timeval now; @@ -289,12 +288,12 @@ qemuMigrationPrepareTunnel(struct qemud_driver *driver, /* Domain starts inactive, even if the domain XML had an id field. */ vm->def->id = -1; - if (virAsprintf(&unixfile, "%s/qemu.tunnelmigrate.dest.%s", - driver->libDir, vm->def->name) < 0) { - virReportOOMError(); + if (pipe(dataFD) < 0 || + virSetCloseExec(dataFD[0]) < 0) { + virReportSystemError(errno, "%s", + _("cannot create pipe for tunnelled migration")); goto endjob; } - unlink(unixfile); /* check that this qemu version supports the interactive exec */ if (qemuCapsExtractVersionInfo(vm->def->emulator, vm->def->os.arch, @@ -304,25 +303,11 @@ qemuMigrationPrepareTunnel(struct qemud_driver *driver, vm->def->emulator); goto endjob; } - if (qemuCapsGet(qemuCaps, QEMU_CAPS_MIGRATE_QEMU_UNIX)) - internalret = virAsprintf(&migrateFrom, "unix:%s", unixfile); - else if (qemuCapsGet(qemuCaps, QEMU_CAPS_MIGRATE_QEMU_EXEC)) - internalret = virAsprintf(&migrateFrom, "exec:nc -U -l %s", unixfile); - else { - qemuReportError(VIR_ERR_OPERATION_FAILED, - "%s", _("Destination qemu is too old to support tunnelled migration")); - goto endjob; - } - if (internalret < 0) { - virReportOOMError(); - goto endjob; - } /* Start the QEMU daemon, with the same command-line arguments plus - * -incoming unix:/path/to/file or exec:nc -U /path/to/file + * -incoming stdin (which qemu_command might convert to exec:cat or fd:n) */ - internalret = qemuProcessStart(dconn, driver, vm, migrateFrom, true, - -1, NULL, VIR_VM_OP_MIGRATE_IN_START); - VIR_FREE(migrateFrom); + internalret = qemuProcessStart(dconn, driver, vm, "stdin", true, dataFD[1], + NULL, VIR_VM_OP_MIGRATE_IN_START); if (internalret < 0) { qemuAuditDomainStart(vm, "migrated", false); /* Note that we don't set an error here because qemuProcessStart @@ -335,9 +320,7 @@ qemuMigrationPrepareTunnel(struct qemud_driver *driver, goto endjob; } - if (virFDStreamConnectUNIX(st, - unixfile, - false) < 0) { + if (virFDStreamOpen(st, dataFD[0]) < 0) { qemuAuditDomainStart(vm, "migrated", false); qemuProcessStop(driver, vm, 0); if (!vm->persistent) { @@ -345,9 +328,8 @@ qemuMigrationPrepareTunnel(struct qemud_driver *driver, virDomainRemoveInactive(&driver->domains, vm); vm = NULL; } - virReportSystemError(errno, - _("cannot open unix socket '%s' for tunnelled migration"), - unixfile); + virReportSystemError(errno, "%s", + _("cannot pass pipe for tunnelled migration")); goto endjob; } @@ -378,9 +360,8 @@ endjob: cleanup: qemuCapsFree(qemuCaps); virDomainDefFree(def); - if (unixfile) - unlink(unixfile); - VIR_FREE(unixfile); + VIR_FORCE_CLOSE(dataFD[0]); + VIR_FORCE_CLOSE(dataFD[1]); if (vm) virDomainObjUnlock(vm); if (event) -- 1.7.4

On Wed, Mar 09, 2011 at 06:45:41PM -0700, Eric Blake wrote:
Outgoing migration still uses a Unix socket and or exec netcat until the next patch.
* src/qemu/qemu_migration.c (qemuMigrationPrepareTunnel): Replace Unix socket with simpler pipe. Suggested by Paolo Bonzini. --- src/qemu/qemu_migration.c | 45 +++++++++++++-------------------------------- 1 files changed, 13 insertions(+), 32 deletions(-)
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

* src/qemu/qemu_monitor_text.h (qemuMonitorTextMigrate): Declare in place of individual monitor commands. * src/qemu/qemu_monitor_json.h (qemuMonitorJSONMigrate): Likewise. * src/qemu/qemu_monitor_text.c (qemuMonitorTextMigrateToHost) (qemuMonitorTextMigrateToCommand, qemuMonitorTextMigrateToFile) (qemuMonitorTextMigrateToUnix): Delete. * src/qemu/qemu_monitor_json.c (qemuMonitorJSONMigrateToHost) (qemuMonitorJSONMigrateToCommand, qemuMonitorJSONMigrateToFile) (qemuMonitorJSONMigrateToUnix): Delete. * src/qemu/qemu_monitor.c (qemuMonitorMigrateToHost) (qemuMonitorMigrateToCommand, qemuMonitorMigrateToFile) (qemuMonitorMigrateToUnix): Consolidate shared code. --- src/qemu/qemu_monitor.c | 92 +++++++++++++++++++++++++++---- src/qemu/qemu_monitor_json.c | 125 +---------------------------------------- src/qemu/qemu_monitor_json.h | 23 +------ src/qemu/qemu_monitor_text.c | 123 +---------------------------------------- src/qemu/qemu_monitor_text.h | 23 +------ 5 files changed, 97 insertions(+), 289 deletions(-) diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index dfb1aad..eb6f8d4 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -1390,6 +1390,7 @@ int qemuMonitorMigrateToHost(qemuMonitorPtr mon, int port) { int ret; + char *uri = NULL; VIR_DEBUG("mon=%p hostname=%s port=%d flags=%u", mon, hostname, port, flags); @@ -1399,10 +1400,18 @@ int qemuMonitorMigrateToHost(qemuMonitorPtr mon, return -1; } + + if (virAsprintf(&uri, "tcp:%s:%d", hostname, port) < 0) { + virReportOOMError(); + return -1; + } + if (mon->json) - ret = qemuMonitorJSONMigrateToHost(mon, flags, hostname, port); + ret = qemuMonitorJSONMigrate(mon, flags, uri); else - ret = qemuMonitorTextMigrateToHost(mon, flags, hostname, port); + ret = qemuMonitorTextMigrate(mon, flags, uri); + + VIR_FREE(uri); return ret; } @@ -1411,7 +1420,9 @@ int qemuMonitorMigrateToCommand(qemuMonitorPtr mon, unsigned int flags, const char * const *argv) { - int ret; + char *argstr; + char *dest = NULL; + int ret = -1; VIR_DEBUG("mon=%p argv=%p flags=%u", mon, argv, flags); @@ -1421,10 +1432,25 @@ int qemuMonitorMigrateToCommand(qemuMonitorPtr mon, return -1; } + argstr = virArgvToString(argv); + if (!argstr) { + virReportOOMError(); + goto cleanup; + } + + if (virAsprintf(&dest, "exec:%s", argstr) < 0) { + virReportOOMError(); + goto cleanup; + } + if (mon->json) - ret = qemuMonitorJSONMigrateToCommand(mon, flags, argv); + ret = qemuMonitorJSONMigrate(mon, flags, dest); else - ret = qemuMonitorTextMigrateToCommand(mon, flags, argv); + ret = qemuMonitorTextMigrate(mon, flags, dest); + +cleanup: + VIR_FREE(argstr); + VIR_FREE(dest); return ret; } @@ -1434,7 +1460,10 @@ int qemuMonitorMigrateToFile(qemuMonitorPtr mon, const char *target, unsigned long long offset) { - int ret; + char *argstr; + char *dest = NULL; + int ret = -1; + char *safe_target = NULL; VIR_DEBUG("mon=%p argv=%p target=%s offset=%llu flags=%u", mon, argv, target, offset, flags); @@ -1451,10 +1480,43 @@ int qemuMonitorMigrateToFile(qemuMonitorPtr mon, return -1; } + argstr = virArgvToString(argv); + if (!argstr) { + virReportOOMError(); + goto cleanup; + } + + /* Migrate to file */ + safe_target = qemuMonitorEscapeShell(target); + if (!safe_target) { + virReportOOMError(); + goto cleanup; + } + + /* Two dd processes, sharing the same stdout, are necessary to + * allow starting at an alignment of 512, but without wasting + * padding to get to the larger alignment useful for speed. Use + * <> redirection to avoid truncating a regular file. */ + if (virAsprintf(&dest, "exec:" VIR_WRAPPER_SHELL_PREFIX "%s | " + "{ dd bs=%llu seek=%llu if=/dev/null && " + "dd bs=%llu; } 1<>%s" VIR_WRAPPER_SHELL_SUFFIX, + argstr, QEMU_MONITOR_MIGRATE_TO_FILE_BS, + offset / QEMU_MONITOR_MIGRATE_TO_FILE_BS, + QEMU_MONITOR_MIGRATE_TO_FILE_TRANSFER_SIZE, + safe_target) < 0) { + virReportOOMError(); + goto cleanup; + } + if (mon->json) - ret = qemuMonitorJSONMigrateToFile(mon, flags, argv, target, offset); + ret = qemuMonitorJSONMigrate(mon, flags, dest); else - ret = qemuMonitorTextMigrateToFile(mon, flags, argv, target, offset); + ret = qemuMonitorTextMigrate(mon, flags, dest); + +cleanup: + VIR_FREE(safe_target); + VIR_FREE(argstr); + VIR_FREE(dest); return ret; } @@ -1462,7 +1524,8 @@ int qemuMonitorMigrateToUnix(qemuMonitorPtr mon, unsigned int flags, const char *unixfile) { - int ret; + char *dest = NULL; + int ret = -1; VIR_DEBUG("mon=%p, unixfile=%s flags=%u", mon, unixfile, flags); @@ -1472,10 +1535,17 @@ int qemuMonitorMigrateToUnix(qemuMonitorPtr mon, return -1; } + if (virAsprintf(&dest, "unix:%s", unixfile) < 0) { + virReportOOMError(); + return -1; + } + if (mon->json) - ret = qemuMonitorJSONMigrateToUnix(mon, flags, unixfile); + ret = qemuMonitorJSONMigrate(mon, flags, dest); else - ret = qemuMonitorTextMigrateToUnix(mon, flags, unixfile); + ret = qemuMonitorTextMigrate(mon, flags, dest); + + VIR_FREE(dest); return ret; } diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index e6903a1..5cf35b9 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -1,7 +1,7 @@ /* * qemu_monitor_json.c: interaction with QEMU monitor console * - * Copyright (C) 2006-2010 Red Hat, Inc. + * Copyright (C) 2006-2011 Red Hat, Inc. * Copyright (C) 2006 Daniel P. Berrange * * This library is free software; you can redistribute it and/or @@ -1669,9 +1669,9 @@ int qemuMonitorJSONGetMigrationStatus(qemuMonitorPtr mon, } -static int qemuMonitorJSONMigrate(qemuMonitorPtr mon, - unsigned int flags, - const char *uri) +int qemuMonitorJSONMigrate(qemuMonitorPtr mon, + unsigned int flags, + const char *uri) { int ret; virJSONValuePtr cmd = @@ -1696,123 +1696,6 @@ static int qemuMonitorJSONMigrate(qemuMonitorPtr mon, return ret; } - -int qemuMonitorJSONMigrateToHost(qemuMonitorPtr mon, - unsigned int flags, - const char *hostname, - int port) -{ - char *uri = NULL; - int ret; - - if (virAsprintf(&uri, "tcp:%s:%d", hostname, port) < 0) { - virReportOOMError(); - return -1; - } - - ret = qemuMonitorJSONMigrate(mon, flags, uri); - - VIR_FREE(uri); - - return ret; -} - - -int qemuMonitorJSONMigrateToCommand(qemuMonitorPtr mon, - unsigned int flags, - const char * const *argv) -{ - char *argstr; - char *dest = NULL; - int ret = -1; - - argstr = virArgvToString(argv); - if (!argstr) { - virReportOOMError(); - goto cleanup; - } - - if (virAsprintf(&dest, "exec:%s", argstr) < 0) { - virReportOOMError(); - goto cleanup; - } - - ret = qemuMonitorJSONMigrate(mon, flags, dest); - -cleanup: - VIR_FREE(argstr); - VIR_FREE(dest); - return ret; -} - -int qemuMonitorJSONMigrateToFile(qemuMonitorPtr mon, - unsigned int flags, - const char * const *argv, - const char *target, - unsigned long long offset) -{ - char *argstr; - char *dest = NULL; - int ret = -1; - char *safe_target = NULL; - - argstr = virArgvToString(argv); - if (!argstr) { - virReportOOMError(); - goto cleanup; - } - - /* Migrate to file */ - safe_target = qemuMonitorEscapeShell(target); - if (!safe_target) { - virReportOOMError(); - goto cleanup; - } - - /* Two dd processes, sharing the same stdout, are necessary to - * allow starting at an alignment of 512, but without wasting - * padding to get to the larger alignment useful for speed. Use - * <> redirection to avoid truncating a regular file. */ - if (virAsprintf(&dest, "exec:" VIR_WRAPPER_SHELL_PREFIX "%s | " - "{ dd bs=%llu seek=%llu if=/dev/null && " - "dd bs=%llu; } 1<>%s" VIR_WRAPPER_SHELL_SUFFIX, - argstr, QEMU_MONITOR_MIGRATE_TO_FILE_BS, - offset / QEMU_MONITOR_MIGRATE_TO_FILE_BS, - QEMU_MONITOR_MIGRATE_TO_FILE_TRANSFER_SIZE, - safe_target) < 0) { - virReportOOMError(); - goto cleanup; - } - - ret = qemuMonitorJSONMigrate(mon, flags, dest); - -cleanup: - VIR_FREE(safe_target); - VIR_FREE(argstr); - VIR_FREE(dest); - return ret; -} - -int qemuMonitorJSONMigrateToUnix(qemuMonitorPtr mon, - unsigned int flags, - const char *unixfile) -{ - char *dest = NULL; - int ret = -1; - - if (virAsprintf(&dest, "unix:%s", unixfile) < 0) { - virReportOOMError(); - return -1; - } - - ret = qemuMonitorJSONMigrate(mon, flags, dest); - - VIR_FREE(dest); - - return ret; -} - - int qemuMonitorJSONMigrateCancel(qemuMonitorPtr mon) { int ret; diff --git a/src/qemu/qemu_monitor_json.h b/src/qemu/qemu_monitor_json.h index 4ae472a..bca7070 100644 --- a/src/qemu/qemu_monitor_json.h +++ b/src/qemu/qemu_monitor_json.h @@ -1,7 +1,7 @@ /* * qemu_monitor_json.h: interaction with QEMU monitor console * - * Copyright (C) 2006-2009 Red Hat, Inc. + * Copyright (C) 2006-2009, 2011 Red Hat, Inc. * Copyright (C) 2006 Daniel P. Berrange * * This library is free software; you can redistribute it and/or @@ -104,24 +104,9 @@ int qemuMonitorJSONGetMigrationStatus(qemuMonitorPtr mon, unsigned long long *remaining, unsigned long long *total); -int qemuMonitorJSONMigrateToHost(qemuMonitorPtr mon, - unsigned int flags, - const char *hostname, - int port); - -int qemuMonitorJSONMigrateToCommand(qemuMonitorPtr mon, - unsigned int flags, - const char * const *argv); - -int qemuMonitorJSONMigrateToFile(qemuMonitorPtr mon, - unsigned int flags, - const char * const *argv, - const char *target, - unsigned long long offset); - -int qemuMonitorJSONMigrateToUnix(qemuMonitorPtr mon, - unsigned int flags, - const char *unixfile); +int qemuMonitorJSONMigrate(qemuMonitorPtr mon, + unsigned int flags, + const char *uri); int qemuMonitorJSONMigrateCancel(qemuMonitorPtr mon); diff --git a/src/qemu/qemu_monitor_text.c b/src/qemu/qemu_monitor_text.c index 0aed690..9b93e1c 100644 --- a/src/qemu/qemu_monitor_text.c +++ b/src/qemu/qemu_monitor_text.c @@ -1,7 +1,7 @@ /* * qemu_monitor_text.c: interaction with QEMU monitor console * - * Copyright (C) 2006-2010 Red Hat, Inc. + * Copyright (C) 2006-2011 Red Hat, Inc. * Copyright (C) 2006 Daniel P. Berrange * * This library is free software; you can redistribute it and/or @@ -1205,9 +1205,9 @@ cleanup: } -static int qemuMonitorTextMigrate(qemuMonitorPtr mon, - unsigned int flags, - const char *dest) +int qemuMonitorTextMigrate(qemuMonitorPtr mon, + unsigned int flags, + const char *dest) { char *cmd = NULL; char *info = NULL; @@ -1271,121 +1271,6 @@ cleanup: return ret; } -int qemuMonitorTextMigrateToHost(qemuMonitorPtr mon, - unsigned int flags, - const char *hostname, - int port) -{ - char *uri = NULL; - int ret; - - if (virAsprintf(&uri, "tcp:%s:%d", hostname, port) < 0) { - virReportOOMError(); - return -1; - } - - ret = qemuMonitorTextMigrate(mon, flags, uri); - - VIR_FREE(uri); - - return ret; -} - - -int qemuMonitorTextMigrateToCommand(qemuMonitorPtr mon, - unsigned int flags, - const char * const *argv) -{ - char *argstr; - char *dest = NULL; - int ret = -1; - - argstr = virArgvToString(argv); - if (!argstr) { - virReportOOMError(); - goto cleanup; - } - - if (virAsprintf(&dest, "exec:%s", argstr) < 0) { - virReportOOMError(); - goto cleanup; - } - - ret = qemuMonitorTextMigrate(mon, flags, dest); - -cleanup: - VIR_FREE(argstr); - VIR_FREE(dest); - return ret; -} - -int qemuMonitorTextMigrateToFile(qemuMonitorPtr mon, - unsigned int flags, - const char * const *argv, - const char *target, - unsigned long long offset) -{ - char *argstr; - char *dest = NULL; - int ret = -1; - char *safe_target = NULL; - - argstr = virArgvToString(argv); - if (!argstr) { - virReportOOMError(); - goto cleanup; - } - - /* Migrate to file */ - safe_target = qemuMonitorEscapeShell(target); - if (!safe_target) { - virReportOOMError(); - goto cleanup; - } - - /* Two dd processes, sharing the same stdout, are necessary to - * allow starting at an alignment of 512, but without wasting - * padding to get to the larger alignment useful for speed. Use - * <> redirection to avoid truncating a regular file. */ - if (virAsprintf(&dest, "exec:" VIR_WRAPPER_SHELL_PREFIX "%s | " - "{ dd bs=%llu seek=%llu if=/dev/null && " - "dd bs=%llu; } 1<>%s" VIR_WRAPPER_SHELL_SUFFIX, - argstr, QEMU_MONITOR_MIGRATE_TO_FILE_BS, - offset / QEMU_MONITOR_MIGRATE_TO_FILE_BS, - QEMU_MONITOR_MIGRATE_TO_FILE_TRANSFER_SIZE, - safe_target) < 0) { - virReportOOMError(); - goto cleanup; - } - - ret = qemuMonitorTextMigrate(mon, flags, dest); - -cleanup: - VIR_FREE(safe_target); - VIR_FREE(argstr); - VIR_FREE(dest); - return ret; -} - -int qemuMonitorTextMigrateToUnix(qemuMonitorPtr mon, - unsigned int flags, - const char *unixfile) -{ - char *dest = NULL; - int ret = -1; - - if (virAsprintf(&dest, "unix:%s", unixfile) < 0) { - virReportOOMError(); - return -1; - } - - ret = qemuMonitorTextMigrate(mon, flags, dest); - - VIR_FREE(dest); - - return ret; -} - int qemuMonitorTextMigrateCancel(qemuMonitorPtr mon) { char *info = NULL; diff --git a/src/qemu/qemu_monitor_text.h b/src/qemu/qemu_monitor_text.h index b29dbcc..c2a85b7 100644 --- a/src/qemu/qemu_monitor_text.h +++ b/src/qemu/qemu_monitor_text.h @@ -1,7 +1,7 @@ /* * qemu_monitor_text.h: interaction with QEMU monitor console * - * Copyright (C) 2006-2009 Red Hat, Inc. + * Copyright (C) 2006-2009, 2011 Red Hat, Inc. * Copyright (C) 2006 Daniel P. Berrange * * This library is free software; you can redistribute it and/or @@ -102,24 +102,9 @@ int qemuMonitorTextGetMigrationStatus(qemuMonitorPtr mon, unsigned long long *remaining, unsigned long long *total); -int qemuMonitorTextMigrateToHost(qemuMonitorPtr mon, - unsigned int flags, - const char *hostname, - int port); - -int qemuMonitorTextMigrateToCommand(qemuMonitorPtr mon, - unsigned int flags, - const char * const *argv); - -int qemuMonitorTextMigrateToFile(qemuMonitorPtr mon, - unsigned int flags, - const char * const *argv, - const char *target, - unsigned long long offset); - -int qemuMonitorTextMigrateToUnix(qemuMonitorPtr mon, - unsigned int flags, - const char *unixfile); +int qemuMonitorTextMigrate(qemuMonitorPtr mon, + unsigned int flags, + const char *uri); int qemuMonitorTextMigrateCancel(qemuMonitorPtr mon); -- 1.7.4

On Wed, Mar 09, 2011 at 06:45:42PM -0700, Eric Blake wrote:
* src/qemu/qemu_monitor_text.h (qemuMonitorTextMigrate): Declare in place of individual monitor commands. * src/qemu/qemu_monitor_json.h (qemuMonitorJSONMigrate): Likewise. * src/qemu/qemu_monitor_text.c (qemuMonitorTextMigrateToHost) (qemuMonitorTextMigrateToCommand, qemuMonitorTextMigrateToFile) (qemuMonitorTextMigrateToUnix): Delete. * src/qemu/qemu_monitor_json.c (qemuMonitorJSONMigrateToHost) (qemuMonitorJSONMigrateToCommand, qemuMonitorJSONMigrateToFile) (qemuMonitorJSONMigrateToUnix): Delete. * src/qemu/qemu_monitor.c (qemuMonitorMigrateToHost) (qemuMonitorMigrateToCommand, qemuMonitorMigrateToFile) (qemuMonitorMigrateToUnix): Consolidate shared code. --- src/qemu/qemu_monitor.c | 92 +++++++++++++++++++++++++++---- src/qemu/qemu_monitor_json.c | 125 +---------------------------------------- src/qemu/qemu_monitor_json.h | 23 +------ src/qemu/qemu_monitor_text.c | 123 +---------------------------------------- src/qemu/qemu_monitor_text.h | 23 +------ 5 files changed, 97 insertions(+), 289 deletions(-)
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 03/10/2011 04:58 AM, Daniel P. Berrange wrote:
On Wed, Mar 09, 2011 at 06:45:42PM -0700, Eric Blake wrote:
* src/qemu/qemu_monitor_text.h (qemuMonitorTextMigrate): Declare in place of individual monitor commands. * src/qemu/qemu_monitor_json.h (qemuMonitorJSONMigrate): Likewise. * src/qemu/qemu_monitor_text.c (qemuMonitorTextMigrateToHost) (qemuMonitorTextMigrateToCommand, qemuMonitorTextMigrateToFile) (qemuMonitorTextMigrateToUnix): Delete. * src/qemu/qemu_monitor_json.c (qemuMonitorJSONMigrateToHost) (qemuMonitorJSONMigrateToCommand, qemuMonitorJSONMigrateToFile) (qemuMonitorJSONMigrateToUnix): Delete. * src/qemu/qemu_monitor.c (qemuMonitorMigrateToHost) (qemuMonitorMigrateToCommand, qemuMonitorMigrateToFile) (qemuMonitorMigrateToUnix): Consolidate shared code. --- src/qemu/qemu_monitor.c | 92 +++++++++++++++++++++++++++---- src/qemu/qemu_monitor_json.c | 125 +---------------------------------------- src/qemu/qemu_monitor_json.h | 23 +------ src/qemu/qemu_monitor_text.c | 123 +---------------------------------------- src/qemu/qemu_monitor_text.h | 23 +------ 5 files changed, 97 insertions(+), 289 deletions(-)
ACK
Thanks; I've pushed 1 and 2, and am still in the process of incorporating review comments from the rest of the series and testing that I haven't broken things with the changes. Also, gnulib just barely learned sendfd/recvfd this weekend, and I'll have a go at seeing how using those simplifies things. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

POSIX states about dd: If the bs=expr operand is specified and no conversions other than sync, noerror, or notrunc are requested, the data returned from each input block shall be written as a separate output block; if the read returns less than a full block and the sync conversion is not specified, the resulting output block shall be the same size as the input block. If the bs=expr operand is not specified, or a conversion other than sync, noerror, or notrunc is requested, the input shall be processed and collected into full-sized output blocks until the end of the input is reached. Since we aren't using conv=sync, there is no zero-padding, but our use of bs= means that a short read results in a short write. If instead we use ibs= and obs=, then short reads are collected and dd only has to do a single write, which can make dd more efficient. * src/qemu/qemu_monitor.c (qemuMonitorMigrateToFile): Avoid 'dd bs=', since it can cause short writes. --- src/qemu/qemu_monitor.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index eb6f8d4..765a1ae 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -1499,10 +1499,11 @@ int qemuMonitorMigrateToFile(qemuMonitorPtr mon, * <> redirection to avoid truncating a regular file. */ if (virAsprintf(&dest, "exec:" VIR_WRAPPER_SHELL_PREFIX "%s | " "{ dd bs=%llu seek=%llu if=/dev/null && " - "dd bs=%llu; } 1<>%s" VIR_WRAPPER_SHELL_SUFFIX, + "dd ibs=%llu obs=%llu; } 1<>%s" VIR_WRAPPER_SHELL_SUFFIX, argstr, QEMU_MONITOR_MIGRATE_TO_FILE_BS, offset / QEMU_MONITOR_MIGRATE_TO_FILE_BS, QEMU_MONITOR_MIGRATE_TO_FILE_TRANSFER_SIZE, + QEMU_MONITOR_MIGRATE_TO_FILE_TRANSFER_SIZE, safe_target) < 0) { virReportOOMError(); goto cleanup; -- 1.7.4

On Wed, Mar 09, 2011 at 06:45:43PM -0700, Eric Blake wrote:
POSIX states about dd:
If the bs=expr operand is specified and no conversions other than sync, noerror, or notrunc are requested, the data returned from each input block shall be written as a separate output block; if the read returns less than a full block and the sync conversion is not specified, the resulting output block shall be the same size as the input block. If the bs=expr operand is not specified, or a conversion other than sync, noerror, or notrunc is requested, the input shall be processed and collected into full-sized output blocks until the end of the input is reached.
Since we aren't using conv=sync, there is no zero-padding, but our use of bs= means that a short read results in a short write. If instead we use ibs= and obs=, then short reads are collected and dd only has to do a single write, which can make dd more efficient.
* src/qemu/qemu_monitor.c (qemuMonitorMigrateToFile): Avoid 'dd bs=', since it can cause short writes. --- src/qemu/qemu_monitor.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-)
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 03/10/2011 04:59 AM, Daniel P. Berrange wrote:
On Wed, Mar 09, 2011 at 06:45:43PM -0700, Eric Blake wrote:
POSIX states about dd:
If the bs=expr operand is specified and no conversions other than sync, noerror, or notrunc are requested, the data returned from each input block shall be written as a separate output block; if the read returns less than a full block and the sync conversion is not specified, the resulting output block shall be the same size as the input block. If the bs=expr operand is not specified, or a conversion other than sync, noerror, or notrunc is requested, the input shall be processed and collected into full-sized output blocks until the end of the input is reached.
Since we aren't using conv=sync, there is no zero-padding, but our use of bs= means that a short read results in a short write. If instead we use ibs= and obs=, then short reads are collected and dd only has to do a single write, which can make dd more efficient.
* src/qemu/qemu_monitor.c (qemuMonitorMigrateToFile): Avoid 'dd bs=', since it can cause short writes. --- src/qemu/qemu_monitor.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-)
ACK
Thanks; applied. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

* src/qemu/qemu_monitor.h (qemuMonitorMigrateToFd): New prototype. * src/qemu/qemu_monitor.c (qemuMonitorMigrateToFd): New function. --- src/qemu/qemu_monitor.c | 31 +++++++++++++++++++++++++++++++ src/qemu/qemu_monitor.h | 4 ++++ 2 files changed, 35 insertions(+), 0 deletions(-) diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index 765a1ae..1e79523 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -1384,6 +1384,37 @@ int qemuMonitorGetMigrationStatus(qemuMonitorPtr mon, } +int qemuMonitorMigrateToFd(qemuMonitorPtr mon, + unsigned int flags, + int fd) +{ + int ret; + VIR_DEBUG("mon=%p fd=%d flags=%u", + mon, fd, flags); + + if (!mon) { + qemuReportError(VIR_ERR_INVALID_ARG, "%s", + _("monitor must not be NULL")); + return -1; + } + + if (qemuMonitorSendFileHandle(mon, "migrate", fd) < 0) + return -1; + + if (mon->json) + ret = qemuMonitorJSONMigrate(mon, flags, "fd:migrate"); + else + ret = qemuMonitorTextMigrate(mon, flags, "fd:migrate"); + + if (ret < 0) { + if (qemuMonitorCloseFileHandle(mon, "migrate") < 0) + VIR_WARN0("failed to close migration handle"); + } + + return ret; +} + + int qemuMonitorMigrateToHost(qemuMonitorPtr mon, unsigned int flags, const char *hostname, diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 0ea1330..1a64ac0 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -260,6 +260,10 @@ typedef enum { QEMU_MONITOR_MIGRATION_FLAGS_LAST } QEMU_MONITOR_MIGRATE; +int qemuMonitorMigrateToFd(qemuMonitorPtr mon, + unsigned int flags, + int fd); + int qemuMonitorMigrateToHost(qemuMonitorPtr mon, unsigned int flags, const char *hostname, -- 1.7.4

On Wed, Mar 09, 2011 at 06:45:44PM -0700, Eric Blake wrote:
* src/qemu/qemu_monitor.h (qemuMonitorMigrateToFd): New prototype. * src/qemu/qemu_monitor.c (qemuMonitorMigrateToFd): New function. --- src/qemu/qemu_monitor.c | 31 +++++++++++++++++++++++++++++++ src/qemu/qemu_monitor.h | 4 ++++ 2 files changed, 35 insertions(+), 0 deletions(-)
ACK
diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index 765a1ae..1e79523 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -1384,6 +1384,37 @@ int qemuMonitorGetMigrationStatus(qemuMonitorPtr mon, }
+int qemuMonitorMigrateToFd(qemuMonitorPtr mon, + unsigned int flags, + int fd) +{ + int ret; + VIR_DEBUG("mon=%p fd=%d flags=%u", + mon, fd, flags); + + if (!mon) { + qemuReportError(VIR_ERR_INVALID_ARG, "%s", + _("monitor must not be NULL")); + return -1; + } + + if (qemuMonitorSendFileHandle(mon, "migrate", fd) < 0) + return -1; + + if (mon->json) + ret = qemuMonitorJSONMigrate(mon, flags, "fd:migrate"); + else + ret = qemuMonitorTextMigrate(mon, flags, "fd:migrate"); + + if (ret < 0) { + if (qemuMonitorCloseFileHandle(mon, "migrate") < 0) + VIR_WARN0("failed to close migration handle"); + } + + return ret; +}
It is a nice idea todo the SendFileHAndle/CloseFileHandle calls from here, rather than in the qemu_driver code itself. It makes the latter much clearer - we should think about whether it makes sense to rewrite the NIC hotplug code to work this way too. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 03/10/2011 05:01 AM, Daniel P. Berrange wrote:
On Wed, Mar 09, 2011 at 06:45:44PM -0700, Eric Blake wrote:
* src/qemu/qemu_monitor.h (qemuMonitorMigrateToFd): New prototype. * src/qemu/qemu_monitor.c (qemuMonitorMigrateToFd): New function. --- src/qemu/qemu_monitor.c | 31 +++++++++++++++++++++++++++++++ src/qemu/qemu_monitor.h | 4 ++++ 2 files changed, 35 insertions(+), 0 deletions(-)
ACK
Thanks; pushed.
+ if (qemuMonitorSendFileHandle(mon, "migrate", fd) < 0) + return -1; + + if (mon->json) + ret = qemuMonitorJSONMigrate(mon, flags, "fd:migrate"); + else + ret = qemuMonitorTextMigrate(mon, flags, "fd:migrate"); + + if (ret < 0) { + if (qemuMonitorCloseFileHandle(mon, "migrate") < 0) + VIR_WARN0("failed to close migration handle"); + } + + return ret; +}
It is a nice idea todo the SendFileHAndle/CloseFileHandle calls from here, rather than in the qemu_driver code itself. It makes the latter much clearer - we should think about whether it makes sense to rewrite the NIC hotplug code to work this way too.
Agreed; I'll submit a patch for that shortly. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

This is also a bug fix - on the error path, qemu_hotplug would leave the configfd file leaked into qemu. At least the next attempt to hotplug a PCI device would reuse the same fdname, and when the qemu getfd monitor command gets a new fd by the same name as an earlier one, it closes the earlier one, so there is no risk of qemu running out of fds. * src/qemu/qemu_monitor.h (qemuMonitorAddDeviceWithFd): New prototype. * src/qemu/qemu_monitor.c (qemuMonitorAddDevice): Move guts... (qemuMonitorAddDeviceWithFd): ...to new function, and add support for fd passing. * src/qemu/qemu_hotplug.c (qemuDomainAttachHostPciDevice): Use it to simplify code. ---
It is a nice idea todo the SendFileHAndle/CloseFileHandle calls from here, rather than in the qemu_driver code itself. It makes the latter much clearer - we should think about whether it makes sense to rewrite the NIC hotplug code to work this way too.
Agreed; I'll submit a patch for that shortly.
Thanks for requesting this - I found and plugged a resource leak in the process. NIC hotplug cleanup coming next. src/qemu/qemu_hotplug.c | 19 +++++++++---------- src/qemu/qemu_monitor.c | 24 +++++++++++++++++++++--- src/qemu/qemu_monitor.h | 5 +++++ 3 files changed, 35 insertions(+), 13 deletions(-) diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index e4ba526..40e4159 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -829,21 +829,19 @@ int qemuDomainAttachHostPciDevice(struct qemud_driver *driver, if (qemuDomainPCIAddressEnsureAddr(priv->pciaddrs, &hostdev->info) < 0) goto error; if (qemuCapsGet(qemuCaps, QEMU_CAPS_PCI_CONFIGFD)) { - configfd = qemuOpenPCIConfig(hostdev); + if (priv->monConfig->type != VIR_DOMAIN_CHR_TYPE_UNIX) { + qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("pci configfd file cannot be attached: " + "qemu is not using a unix socket monitor")); + } else { + configfd = qemuOpenPCIConfig(hostdev); + } if (configfd >= 0) { if (virAsprintf(&configfd_name, "fd-%s", hostdev->info.alias) < 0) { virReportOOMError(); goto error; } - - qemuDomainObjEnterMonitorWithDriver(driver, vm); - if (qemuMonitorSendFileHandle(priv->mon, configfd_name, - configfd) < 0) { - qemuDomainObjExitMonitorWithDriver(driver, vm); - goto error; - } - qemuDomainObjExitMonitorWithDriver(driver, vm); } } @@ -858,7 +856,8 @@ int qemuDomainAttachHostPciDevice(struct qemud_driver *driver, goto error; qemuDomainObjEnterMonitorWithDriver(driver, vm); - ret = qemuMonitorAddDevice(priv->mon, devstr); + ret = qemuMonitorAddDeviceWithFd(priv->mon, devstr, + configfd, configfd_name); qemuDomainObjExitMonitorWithDriver(driver, vm); } else { virDomainDevicePCIAddress guestAddr; diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index da38096..f2b1721 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -2016,10 +2016,13 @@ int qemuMonitorDelDevice(qemuMonitorPtr mon, } -int qemuMonitorAddDevice(qemuMonitorPtr mon, - const char *devicestr) +int qemuMonitorAddDeviceWithFd(qemuMonitorPtr mon, + const char *devicestr, + int fd, + const char *fdname) { - VIR_DEBUG("mon=%p device=%s", mon, devicestr); + VIR_DEBUG("mon=%p device=%s fd=%d fdname=%s", mon, devicestr, fd, + NULLSTR(fdname)); int ret; if (!mon) { @@ -2028,13 +2031,28 @@ int qemuMonitorAddDevice(qemuMonitorPtr mon, return -1; } + if (fd >= 0 && qemuMonitorSendFileHandle(mon, fdname, fd) < 0) + return -1; + if (mon->json) ret = qemuMonitorJSONAddDevice(mon, devicestr); else ret = qemuMonitorTextAddDevice(mon, devicestr); + + if (ret < 0 && fd >= 0) { + if (qemuMonitorCloseFileHandle(mon, fdname) < 0) + VIR_WARN("failed to close device handle '%s'", fdname); + } + return ret; } +int qemuMonitorAddDevice(qemuMonitorPtr mon, + const char *devicestr) +{ + return qemuMonitorAddDeviceWithFd(mon, devicestr, -1, NULL); +} + int qemuMonitorAddDrive(qemuMonitorPtr mon, const char *drivestr) { diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 7bea083..a20ff8e 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -390,6 +390,11 @@ int qemuMonitorGetAllPCIAddresses(qemuMonitorPtr mon, int qemuMonitorAddDevice(qemuMonitorPtr mon, const char *devicestr); +int qemuMonitorAddDeviceWithFd(qemuMonitorPtr mon, + const char *devicestr, + int fd, + const char *fdname); + int qemuMonitorDelDevice(qemuMonitorPtr mon, const char *devalias); -- 1.7.4

* Eric Blake (eblake@redhat.com) wrote:
@@ -829,21 +829,19 @@ int qemuDomainAttachHostPciDevice(struct qemud_driver *driver, if (qemuDomainPCIAddressEnsureAddr(priv->pciaddrs, &hostdev->info) < 0) goto error; if (qemuCapsGet(qemuCaps, QEMU_CAPS_PCI_CONFIGFD)) { - configfd = qemuOpenPCIConfig(hostdev); + if (priv->monConfig->type != VIR_DOMAIN_CHR_TYPE_UNIX) { + qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("pci configfd file cannot be attached: " + "qemu is not using a unix socket monitor"));
Should that be in common sendfd code?

On 03/15/2011 06:19 PM, Chris Wright wrote:
* Eric Blake (eblake@redhat.com) wrote:
@@ -829,21 +829,19 @@ int qemuDomainAttachHostPciDevice(struct qemud_driver *driver, if (qemuDomainPCIAddressEnsureAddr(priv->pciaddrs, &hostdev->info) < 0) goto error; if (qemuCapsGet(qemuCaps, QEMU_CAPS_PCI_CONFIGFD)) { - configfd = qemuOpenPCIConfig(hostdev); + if (priv->monConfig->type != VIR_DOMAIN_CHR_TYPE_UNIX) { + qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("pci configfd file cannot be attached: " + "qemu is not using a unix socket monitor"));
Should that be in common sendfd code?
Yeah, factoring that into a common location _would_ be nicer. I'll whip up a patch. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

Currently, the hook function in virFileOperation is extremely limited: it must be async-signal-safe, and cannot modify any memory in the parent process. It is much handier to return a valid fd and operate on it in the parent than to deal with hook restrictions. * src/util/util.h (VIR_FILE_OP_RETURN_FD): New flag. * src/util/util.c (virFileOperationNoFork, virFileOperation): Honor new flag. --- src/util/util.c | 139 ++++++++++++++++++++++++++++++++++++++++++++++++------ src/util/util.h | 4 +- 2 files changed, 126 insertions(+), 17 deletions(-) diff --git a/src/util/util.c b/src/util/util.c index f41e117..6bf3219 100644 --- a/src/util/util.c +++ b/src/util/util.c @@ -1392,14 +1392,15 @@ static int virFileOperationNoFork(const char *path, int openflags, mode_t mode, if ((hook) && ((ret = hook(fd, hookdata)) != 0)) { goto error; } + if (flags & VIR_FILE_OP_RETURN_FD) + return fd; if (VIR_CLOSE(fd) < 0) { ret = -errno; virReportSystemError(errno, _("failed to close new file '%s'"), path); - fd = -1; goto error; } - fd = -1; + error: VIR_FORCE_CLOSE(fd); return ret; @@ -1444,7 +1445,32 @@ error: return ret; } -/* return -errno on failure, or 0 on success */ +/** + * virFileOperation: + * @path: file to open or create + * @openflags: flags to pass to open + * @mode: mode to use on creation or when forcing permissions + * @uid: uid that should own file on creation + * @gid: gid that should own file + * @hook: callback to run once file is open; see below for restrictions + * @hookdata: opaque data for @hook + * @flags: bit-wise or of VIR_FILE_OP_* flags + * + * Open @path, and execute an optional callback @hook on the open file + * description. @hook must return 0 on success, or -errno on failure. + * If @flags includes VIR_FILE_OP_AS_UID, then open the file while the + * effective user id is @uid (by using a child process); this allows + * one to bypass root-squashing NFS issues. If @flags includes + * VIR_FILE_OP_FORCE_PERMS, then ensure that @path has those + * permissions before the callback, even if the file already existed + * with different permissions. If @flags includes + * VIR_FILE_OP_RETURN_FD, then use SCM_RIGHTS to guarantee that any + * @hook is run in the parent, and the return value (if non-negative) + * is the file descriptor, left open. Otherwise, @hook might be run + * in a child process, so it must be async-signal-safe (ie. no + * malloc() or anything else that depends on modifying locks held in + * the parent), no file descriptor remains open, and 0 is returned on + * success. Return -errno on failure. */ int virFileOperation(const char *path, int openflags, mode_t mode, uid_t uid, gid_t gid, virFileOperationHook hook, void *hookdata, @@ -1452,7 +1478,14 @@ int virFileOperation(const char *path, int openflags, mode_t mode, struct stat st; pid_t pid; int waitret, status, ret = 0; - int fd; + int fd = -1; + int pair[2] = { -1, -1 }; + struct msghdr msg; + struct cmsghdr *cmsg; + char buf[CMSG_SPACE(sizeof(fd))]; + char dummy = 0; + struct iovec iov; + int forkRet; if ((!(flags & VIR_FILE_OP_AS_UID)) || (getuid() != 0) @@ -1466,7 +1499,29 @@ int virFileOperation(const char *path, int openflags, mode_t mode, * following dance avoids problems caused by root-squashing * NFS servers. */ - int forkRet = virFork(&pid); + if (flags & VIR_FILE_OP_RETURN_FD) { + if (socketpair(AF_UNIX, SOCK_DGRAM, 0, pair) < 0) { + ret = -errno; + virReportSystemError(errno, + _("failed to create socket needed for '%s'"), + path); + return ret; + } + + memset(&msg, 0, sizeof(msg)); + iov.iov_base = &dummy; + iov.iov_len = 1; + msg.msg_iov = &iov; + msg.msg_iovlen = 1; + msg.msg_control = buf; + msg.msg_controllen = sizeof(buf); + cmsg = CMSG_FIRSTHDR(&msg); + cmsg->cmsg_level = SOL_SOCKET; + cmsg->cmsg_type = SCM_RIGHTS; + cmsg->cmsg_len = CMSG_LEN(sizeof(fd)); + } + + forkRet = virFork(&pid); if (pid < 0) { ret = -errno; @@ -1474,6 +1529,31 @@ int virFileOperation(const char *path, int openflags, mode_t mode, } if (pid) { /* parent */ + if (flags & VIR_FILE_OP_RETURN_FD) { + VIR_FORCE_CLOSE(pair[1]); + + do { + ret = recvmsg(pair[0], &msg, 0); + } while (ret < 0 && errno == EINTR); + + if (ret < 0) { + ret = -errno; + VIR_FORCE_CLOSE(pair[0]); + while ((waitret = waitpid(pid, NULL, 0) == -1) + && (errno == EINTR)); + goto parenterror; + } + VIR_FORCE_CLOSE(pair[0]); + + /* See if fd was transferred. */ + cmsg = CMSG_FIRSTHDR(&msg); + if (cmsg && cmsg->cmsg_len == CMSG_LEN(sizeof(fd)) && + cmsg->cmsg_level == SOL_SOCKET && + cmsg->cmsg_type == SCM_RIGHTS) { + memcpy(&fd, CMSG_DATA(cmsg), sizeof(fd)); + } + } + /* wait for child to complete, and retrieve its exit code */ while ((waitret = waitpid(pid, &status, 0) == -1) && (errno == EINTR)); @@ -1485,12 +1565,20 @@ int virFileOperation(const char *path, int openflags, mode_t mode, goto parenterror; } ret = -WEXITSTATUS(status); - if (!WIFEXITED(status) || (ret == -EACCES)) { + if (!WIFEXITED(status) || (ret == -EACCES) || + ((flags & VIR_FILE_OP_RETURN_FD) && fd == -1)) { /* fall back to the simpler method, which works better in * some cases */ return virFileOperationNoFork(path, openflags, mode, uid, gid, hook, hookdata, flags); } + if (!ret && (flags & VIR_FILE_OP_RETURN_FD)) { + if ((hook) && ((ret = hook(fd, hookdata)) != 0)) { + VIR_FORCE_CLOSE(fd); + goto parenterror; + } + ret = fd; + } parenterror: return ret; } @@ -1500,6 +1588,7 @@ parenterror: if (forkRet < 0) { /* error encountered and logged in virFork() after the fork. */ + ret = -errno; goto childerror; } @@ -1549,15 +1638,33 @@ parenterror: path, mode); goto childerror; } - if ((hook) && ((ret = hook(fd, hookdata)) != 0)) { - goto childerror; - } - if (VIR_CLOSE(fd) < 0) { - ret = -errno; - virReportSystemError(errno, _("child failed to close new file '%s'"), - path); - goto childerror; + if (flags & VIR_FILE_OP_RETURN_FD) { + VIR_FORCE_CLOSE(pair[0]); + memcpy(CMSG_DATA(cmsg), &fd, sizeof(fd)); + + do { + ret = sendmsg(pair[1], &msg, 0); + } while (ret < 0 && errno == EINTR); + + if (ret < 0) { + ret = -errno; + VIR_FORCE_CLOSE(pair[1]); + goto childerror; + } + ret = 0; + } else { + if ((hook) && ((ret = hook(fd, hookdata)) != 0)) + goto childerror; + if (VIR_CLOSE(fd) < 0) { + ret = -errno; + virReportSystemError(errno, + _("child failed to close new file '%s'"), + path); + goto childerror; + } } + + ret = 0; childerror: /* ret tracks -errno on failure, but exit value must be positive. * If the child exits with EACCES, then the parent tries again. */ @@ -1688,7 +1795,7 @@ int virFileOperation(const char *path ATTRIBUTE_UNUSED, virUtilError(VIR_ERR_INTERNAL_ERROR, "%s", _("virFileOperation is not implemented for WIN32")); - return -1; + return -ENOSYS; } int virDirCreate(const char *path ATTRIBUTE_UNUSED, @@ -1700,7 +1807,7 @@ int virDirCreate(const char *path ATTRIBUTE_UNUSED, virUtilError(VIR_ERR_INTERNAL_ERROR, "%s", _("virDirCreate is not implemented for WIN32")); - return -1; + return -ENOSYS; } #endif /* WIN32 */ diff --git a/src/util/util.h b/src/util/util.h index 5f6473c..3970d58 100644 --- a/src/util/util.h +++ b/src/util/util.h @@ -128,12 +128,14 @@ enum { VIR_FILE_OP_NONE = 0, VIR_FILE_OP_AS_UID = (1 << 0), VIR_FILE_OP_FORCE_PERMS = (1 << 1), + VIR_FILE_OP_RETURN_FD = (1 << 2), }; typedef int (*virFileOperationHook)(int fd, void *data); int virFileOperation(const char *path, int openflags, mode_t mode, uid_t uid, gid_t gid, virFileOperationHook hook, void *hookdata, - unsigned int flags) ATTRIBUTE_RETURN_CHECK; + unsigned int flags) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_RETURN_CHECK; enum { VIR_DIR_CREATE_NONE = 0, -- 1.7.4

On Wed, Mar 09, 2011 at 06:45:45PM -0700, Eric Blake wrote:
Currently, the hook function in virFileOperation is extremely limited: it must be async-signal-safe, and cannot modify any memory in the parent process. It is much handier to return a valid fd and operate on it in the parent than to deal with hook restrictions.
* src/util/util.h (VIR_FILE_OP_RETURN_FD): New flag. * src/util/util.c (virFileOperationNoFork, virFileOperation): Honor new flag. --- src/util/util.c | 139 ++++++++++++++++++++++++++++++++++++++++++++++++------ src/util/util.h | 4 +- 2 files changed, 126 insertions(+), 17 deletions(-)
ACK,
+ memset(&msg, 0, sizeof(msg)); + iov.iov_base = &dummy; + iov.iov_len = 1; + msg.msg_iov = &iov; + msg.msg_iovlen = 1; + msg.msg_control = buf; + msg.msg_controllen = sizeof(buf); + cmsg = CMSG_FIRSTHDR(&msg); + cmsg->cmsg_level = SOL_SOCKET; + cmsg->cmsg_type = SCM_RIGHTS; + cmsg->cmsg_len = CMSG_LEN(sizeof(fd)); + }
....
+ if (flags & VIR_FILE_OP_RETURN_FD) { + VIR_FORCE_CLOSE(pair[1]); + + do { + ret = recvmsg(pair[0], &msg, 0); + } while (ret < 0 && errno == EINTR); + + if (ret < 0) { + ret = -errno; + VIR_FORCE_CLOSE(pair[0]); + while ((waitret = waitpid(pid, NULL, 0) == -1) + && (errno == EINTR)); + goto parenterror; + } + VIR_FORCE_CLOSE(pair[0]); + + /* See if fd was transferred. */ + cmsg = CMSG_FIRSTHDR(&msg); + if (cmsg && cmsg->cmsg_len == CMSG_LEN(sizeof(fd)) && + cmsg->cmsg_level == SOL_SOCKET && + cmsg->cmsg_type == SCM_RIGHTS) { + memcpy(&fd, CMSG_DATA(cmsg), sizeof(fd)); + }
Also might be nice to create two helper methods for this int virFileHandleSendToSocket(int unixfd, int sendfd); int virFileHandleRecvFromSocket(int unixfd); which we can also use in the monitor code that does the same thing Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 03/10/2011 05:03 AM, Daniel P. Berrange wrote:
+ + /* See if fd was transferred. */ + cmsg = CMSG_FIRSTHDR(&msg); + if (cmsg && cmsg->cmsg_len == CMSG_LEN(sizeof(fd)) && + cmsg->cmsg_level == SOL_SOCKET && + cmsg->cmsg_type == SCM_RIGHTS) { + memcpy(&fd, CMSG_DATA(cmsg), sizeof(fd)); + }
Also might be nice to create two helper methods for this
int virFileHandleSendToSocket(int unixfd, int sendfd); int virFileHandleRecvFromSocket(int unixfd);
which we can also use in the monitor code that does the same thing
Actually, my plans are to import the gnulib module passfd, which defines sendfd()/recvfd(), and use those. But I'd rather defer that to post-0.9.0, and keep this patch doing things manually, in case someone ever needs to backport these patches to RHEL without also pulling in a gnulib submodule update. Unfortunately, the monitor code probably can't use sendfd(), because it makes assumptions that the receiver is using recvfd() but the qemu monitor code is actually expecting to parse the fd off the 'getfd' monitor command rather than a one-byte dummy command. But we may have other places where sendfd/recvfd makes sense (such as if we need to temporarily change umask while creating a socket). -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On 03/09/2011 08:45 PM, Eric Blake wrote:
Currently, the hook function in virFileOperation is extremely limited: it must be async-signal-safe, and cannot modify any memory in the parent process. It is much handier to return a valid fd and operate on it in the parent than to deal with hook restrictions.
* src/util/util.h (VIR_FILE_OP_RETURN_FD): New flag. * src/util/util.c (virFileOperationNoFork, virFileOperation): Honor new flag. --- src/util/util.c | 139 ++++++++++++++++++++++++++++++++++++++++++++++++------ src/util/util.h | 4 +- 2 files changed, 126 insertions(+), 17 deletions(-)
diff --git a/src/util/util.c b/src/util/util.c index f41e117..6bf3219 100644 --- a/src/util/util.c +++ b/src/util/util.c @@ -1392,14 +1392,15 @@ static int virFileOperationNoFork(const char *path, int openflags, mode_t mode, if ((hook)&& ((ret = hook(fd, hookdata)) != 0)) { goto error; } + if (flags& VIR_FILE_OP_RETURN_FD) + return fd; if (VIR_CLOSE(fd)< 0) { ret = -errno; virReportSystemError(errno, _("failed to close new file '%s'"), path); - fd = -1; goto error; } - fd = -1; + error: VIR_FORCE_CLOSE(fd); return ret; @@ -1444,7 +1445,32 @@ error: return ret; }
-/* return -errno on failure, or 0 on success */ +/** + * virFileOperation: + * @path: file to open or create + * @openflags: flags to pass to open + * @mode: mode to use on creation or when forcing permissions + * @uid: uid that should own file on creation + * @gid: gid that should own file + * @hook: callback to run once file is open; see below for restrictions + * @hookdata: opaque data for @hook + * @flags: bit-wise or of VIR_FILE_OP_* flags + * + * Open @path, and execute an optional callback @hook on the open file + * description. @hook must return 0 on success, or -errno on failure. + * If @flags includes VIR_FILE_OP_AS_UID, then open the file while the + * effective user id is @uid (by using a child process); this allows + * one to bypass root-squashing NFS issues. If @flags includes + * VIR_FILE_OP_FORCE_PERMS, then ensure that @path has those + * permissions before the callback, even if the file already existed + * with different permissions. If @flags includes + * VIR_FILE_OP_RETURN_FD, then use SCM_RIGHTS to guarantee that any + * @hook is run in the parent, and the return value (if non-negative) + * is the file descriptor, left open. Otherwise, @hook might be run + * in a child process, so it must be async-signal-safe (ie. no + * malloc() or anything else that depends on modifying locks held in + * the parent), no file descriptor remains open, and 0 is returned on + * success. Return -errno on failure. */ int virFileOperation(const char *path, int openflags, mode_t mode, uid_t uid, gid_t gid, virFileOperationHook hook, void *hookdata, @@ -1452,7 +1478,14 @@ int virFileOperation(const char *path, int openflags, mode_t mode, struct stat st; pid_t pid; int waitret, status, ret = 0; - int fd; + int fd = -1; + int pair[2] = { -1, -1 }; + struct msghdr msg; + struct cmsghdr *cmsg; + char buf[CMSG_SPACE(sizeof(fd))]; + char dummy = 0; + struct iovec iov; + int forkRet;
if ((!(flags& VIR_FILE_OP_AS_UID)) || (getuid() != 0) @@ -1466,7 +1499,29 @@ int virFileOperation(const char *path, int openflags, mode_t mode, * following dance avoids problems caused by root-squashing * NFS servers. */
- int forkRet = virFork(&pid); + if (flags& VIR_FILE_OP_RETURN_FD) { + if (socketpair(AF_UNIX, SOCK_DGRAM, 0, pair)< 0) {
As mentioned in other mail, if you change SOCK_DGRAM to SOCK_STREAM, the parent will no longer hang on recvmsg() if the child neglects to call sendmsg().
+ ret = -errno; + virReportSystemError(errno, + _("failed to create socket needed for '%s'"), + path); + return ret; + } + + memset(&msg, 0, sizeof(msg)); + iov.iov_base =&dummy; + iov.iov_len = 1; + msg.msg_iov =&iov; + msg.msg_iovlen = 1; + msg.msg_control = buf; + msg.msg_controllen = sizeof(buf); + cmsg = CMSG_FIRSTHDR(&msg); + cmsg->cmsg_level = SOL_SOCKET; + cmsg->cmsg_type = SCM_RIGHTS; + cmsg->cmsg_len = CMSG_LEN(sizeof(fd)); + } + + forkRet = virFork(&pid);
if (pid< 0) { ret = -errno; @@ -1474,6 +1529,31 @@ int virFileOperation(const char *path, int openflags, mode_t mode, }
if (pid) { /* parent */ + if (flags& VIR_FILE_OP_RETURN_FD) { + VIR_FORCE_CLOSE(pair[1]); + + do { + ret = recvmsg(pair[0],&msg, 0); + } while (ret< 0&& errno == EINTR); + + if (ret< 0) { + ret = -errno; + VIR_FORCE_CLOSE(pair[0]); + while ((waitret = waitpid(pid, NULL, 0) == -1) +&& (errno == EINTR)); + goto parenterror; + } + VIR_FORCE_CLOSE(pair[0]); + + /* See if fd was transferred. */ + cmsg = CMSG_FIRSTHDR(&msg); + if (cmsg&& cmsg->cmsg_len == CMSG_LEN(sizeof(fd))&& + cmsg->cmsg_level == SOL_SOCKET&& + cmsg->cmsg_type == SCM_RIGHTS) { + memcpy(&fd, CMSG_DATA(cmsg), sizeof(fd)); + } + } + /* wait for child to complete, and retrieve its exit code */ while ((waitret = waitpid(pid,&status, 0) == -1) && (errno == EINTR)); @@ -1485,12 +1565,20 @@ int virFileOperation(const char *path, int openflags, mode_t mode, goto parenterror; } ret = -WEXITSTATUS(status); - if (!WIFEXITED(status) || (ret == -EACCES)) { + if (!WIFEXITED(status) || (ret == -EACCES) || + ((flags& VIR_FILE_OP_RETURN_FD)&& fd == -1)) { /* fall back to the simpler method, which works better in * some cases */ return virFileOperationNoFork(path, openflags, mode, uid, gid, hook, hookdata, flags); } + if (!ret&& (flags& VIR_FILE_OP_RETURN_FD)) { + if ((hook)&& ((ret = hook(fd, hookdata)) != 0)) { + VIR_FORCE_CLOSE(fd); + goto parenterror; + } + ret = fd; + } parenterror: return ret; } @@ -1500,6 +1588,7 @@ parenterror:
if (forkRet< 0) { /* error encountered and logged in virFork() after the fork. */ + ret = -errno; goto childerror; }
@@ -1549,15 +1638,33 @@ parenterror: path, mode); goto childerror; } - if ((hook)&& ((ret = hook(fd, hookdata)) != 0)) { - goto childerror; - } - if (VIR_CLOSE(fd)< 0) { - ret = -errno; - virReportSystemError(errno, _("child failed to close new file '%s'"), - path); - goto childerror; + if (flags& VIR_FILE_OP_RETURN_FD) { + VIR_FORCE_CLOSE(pair[0]); + memcpy(CMSG_DATA(cmsg),&fd, sizeof(fd)); + + do { + ret = sendmsg(pair[1],&msg, 0); + } while (ret< 0&& errno == EINTR); + + if (ret< 0) { + ret = -errno; + VIR_FORCE_CLOSE(pair[1]);
Why not just always close pair[1]? (right now you're depending on _exit() to close it)
+ goto childerror; + } + ret = 0; + } else { + if ((hook)&& ((ret = hook(fd, hookdata)) != 0)) + goto childerror; + if (VIR_CLOSE(fd)< 0) { + ret = -errno; + virReportSystemError(errno, + _("child failed to close new file '%s'"), + path); + goto childerror; + } } + + ret = 0; childerror: /* ret tracks -errno on failure, but exit value must be positive. * If the child exits with EACCES, then the parent tries again. */ @@ -1688,7 +1795,7 @@ int virFileOperation(const char *path ATTRIBUTE_UNUSED, virUtilError(VIR_ERR_INTERNAL_ERROR, "%s", _("virFileOperation is not implemented for WIN32"));
- return -1; + return -ENOSYS; }
int virDirCreate(const char *path ATTRIBUTE_UNUSED, @@ -1700,7 +1807,7 @@ int virDirCreate(const char *path ATTRIBUTE_UNUSED, virUtilError(VIR_ERR_INTERNAL_ERROR, "%s", _("virDirCreate is not implemented for WIN32"));
- return -1; + return -ENOSYS; } #endif /* WIN32 */
diff --git a/src/util/util.h b/src/util/util.h index 5f6473c..3970d58 100644 --- a/src/util/util.h +++ b/src/util/util.h @@ -128,12 +128,14 @@ enum { VIR_FILE_OP_NONE = 0, VIR_FILE_OP_AS_UID = (1<< 0), VIR_FILE_OP_FORCE_PERMS = (1<< 1), + VIR_FILE_OP_RETURN_FD = (1<< 2), }; typedef int (*virFileOperationHook)(int fd, void *data); int virFileOperation(const char *path, int openflags, mode_t mode, uid_t uid, gid_t gid, virFileOperationHook hook, void *hookdata, - unsigned int flags) ATTRIBUTE_RETURN_CHECK; + unsigned int flags) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_RETURN_CHECK;
enum { VIR_DIR_CREATE_NONE = 0,
ACK with the above two nits fixed.

This allows direct saves (no compression, no root-squash NFS) to use the more efficient fd: migration, which in turn avoids a race where qemu exec: migration can sometimes fail because qemu does a generic waitpid() that conflicts with the pclose() used by exec:. Further patches will solve compression and root-squash NFS. * src/qemu/qemu_driver.c (qemudDomainSaveFlag): Use new function when there is no compression. --- src/qemu/qemu_driver.c | 26 +++++++++++++++++++++++--- 1 files changed, 23 insertions(+), 3 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 2d11da8..c4d3c85 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -1790,6 +1790,7 @@ static int qemudDomainSaveFlag(struct qemud_driver *driver, virDomainPtr dom, int is_reg = 0; unsigned long long offset; virCgroupPtr cgroup = NULL; + virBitmapPtr qemuCaps = NULL; memset(&header, 0, sizeof(header)); memcpy(header.magic, QEMUD_SAVE_MAGIC, sizeof(header.magic)); @@ -1820,6 +1821,11 @@ static int qemudDomainSaveFlag(struct qemud_driver *driver, virDomainPtr dom, } } + if (qemuCapsExtractVersionInfo(vm->def->emulator, vm->def->os.arch, + NULL, + &qemuCaps) < 0) + goto endjob; + /* Get XML for the domain */ xml = virDomainDefFormat(vm->def, VIR_DOMAIN_XML_SECURE); if (!xml) { @@ -1980,10 +1986,23 @@ static int qemudDomainSaveFlag(struct qemud_driver *driver, virDomainPtr dom, if (header.compressed == QEMUD_SAVE_FORMAT_RAW) { const char *args[] = { "cat", NULL }; + /* XXX gross - why don't we reuse the fd already opened earlier */ + int fd = -1; + + if (qemuCapsGet(qemuCaps, QEMU_CAPS_MIGRATE_QEMU_FD) && + priv->monConfig->type == VIR_DOMAIN_CHR_TYPE_UNIX) + fd = open(path, O_WRONLY); qemuDomainObjEnterMonitorWithDriver(driver, vm); - rc = qemuMonitorMigrateToFile(priv->mon, - QEMU_MONITOR_MIGRATE_BACKGROUND, - args, path, offset); + if (fd >= 0 && lseek(fd, offset, SEEK_SET) == offset) { + rc = qemuMonitorMigrateToFd(priv->mon, + QEMU_MONITOR_MIGRATE_BACKGROUND, + fd); + } else { + rc = qemuMonitorMigrateToFile(priv->mon, + QEMU_MONITOR_MIGRATE_BACKGROUND, + args, path, offset); + } + VIR_FORCE_CLOSE(fd); qemuDomainObjExitMonitorWithDriver(driver, vm); } else { const char *prog = qemudSaveCompressionTypeToString(header.compressed); @@ -2065,6 +2084,7 @@ endjob: } cleanup: + qemuCapsFree(qemuCaps); VIR_FREE(xml); if (ret != 0 && is_reg) unlink(path); -- 1.7.4

On Wed, Mar 09, 2011 at 06:45:46PM -0700, Eric Blake wrote:
This allows direct saves (no compression, no root-squash NFS) to use the more efficient fd: migration, which in turn avoids a race where qemu exec: migration can sometimes fail because qemu does a generic waitpid() that conflicts with the pclose() used by exec:. Further patches will solve compression and root-squash NFS.
* src/qemu/qemu_driver.c (qemudDomainSaveFlag): Use new function when there is no compression. --- src/qemu/qemu_driver.c | 26 +++++++++++++++++++++++--- 1 files changed, 23 insertions(+), 3 deletions(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 2d11da8..c4d3c85 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -1790,6 +1790,7 @@ static int qemudDomainSaveFlag(struct qemud_driver *driver, virDomainPtr dom, int is_reg = 0; unsigned long long offset; virCgroupPtr cgroup = NULL; + virBitmapPtr qemuCaps = NULL;
memset(&header, 0, sizeof(header)); memcpy(header.magic, QEMUD_SAVE_MAGIC, sizeof(header.magic)); @@ -1820,6 +1821,11 @@ static int qemudDomainSaveFlag(struct qemud_driver *driver, virDomainPtr dom, } }
+ if (qemuCapsExtractVersionInfo(vm->def->emulator, vm->def->os.arch, + NULL, + &qemuCaps) < 0) + goto endjob; + /* Get XML for the domain */ xml = virDomainDefFormat(vm->def, VIR_DOMAIN_XML_SECURE); if (!xml) { @@ -1980,10 +1986,23 @@ static int qemudDomainSaveFlag(struct qemud_driver *driver, virDomainPtr dom,
if (header.compressed == QEMUD_SAVE_FORMAT_RAW) { const char *args[] = { "cat", NULL }; + /* XXX gross - why don't we reuse the fd already opened earlier */ + int fd = -1; + + if (qemuCapsGet(qemuCaps, QEMU_CAPS_MIGRATE_QEMU_FD) && + priv->monConfig->type == VIR_DOMAIN_CHR_TYPE_UNIX) + fd = open(path, O_WRONLY);
Ok, so this is the bit which causes a regression on NFS rootsquash, if the path isn't readable by root.
qemuDomainObjEnterMonitorWithDriver(driver, vm); - rc = qemuMonitorMigrateToFile(priv->mon, - QEMU_MONITOR_MIGRATE_BACKGROUND, - args, path, offset); + if (fd >= 0 && lseek(fd, offset, SEEK_SET) == offset) { + rc = qemuMonitorMigrateToFd(priv->mon, + QEMU_MONITOR_MIGRATE_BACKGROUND, + fd); + } else { + rc = qemuMonitorMigrateToFile(priv->mon, + QEMU_MONITOR_MIGRATE_BACKGROUND, + args, path, offset); + } + VIR_FORCE_CLOSE(fd); qemuDomainObjExitMonitorWithDriver(driver, vm); } else { const char *prog = qemudSaveCompressionTypeToString(header.compressed);
Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 03/10/2011 05:05 AM, Daniel P. Berrange wrote:
On Wed, Mar 09, 2011 at 06:45:46PM -0700, Eric Blake wrote:
This allows direct saves (no compression, no root-squash NFS) to use the more efficient fd: migration, which in turn avoids a race where qemu exec: migration can sometimes fail because qemu does a generic waitpid() that conflicts with the pclose() used by exec:. Further patches will solve compression and root-squash NFS.
* src/qemu/qemu_driver.c (qemudDomainSaveFlag): Use new function when there is no compression. --- src/qemu/qemu_driver.c | 26 +++++++++++++++++++++++--- 1 files changed, 23 insertions(+), 3 deletions(-)
+ /* XXX gross - why don't we reuse the fd already opened earlier */ + int fd = -1; + + if (qemuCapsGet(qemuCaps, QEMU_CAPS_MIGRATE_QEMU_FD) && + priv->monConfig->type == VIR_DOMAIN_CHR_TYPE_UNIX) + fd = open(path, O_WRONLY);
Ok, so this is the bit which causes a regression on NFS rootsquash, if the path isn't readable by root.
Nope; no regression - just no fd: migration (it's a graceful fallback to exec: migration on NFS rootsquash until later patches in the series).
qemuDomainObjEnterMonitorWithDriver(driver, vm); - rc = qemuMonitorMigrateToFile(priv->mon, - QEMU_MONITOR_MIGRATE_BACKGROUND, - args, path, offset); + if (fd >= 0 && lseek(fd, offset, SEEK_SET) == offset) { + rc = qemuMonitorMigrateToFd(priv->mon, + QEMU_MONITOR_MIGRATE_BACKGROUND, + fd); + } else { + rc = qemuMonitorMigrateToFile(priv->mon, + QEMU_MONITOR_MIGRATE_BACKGROUND, + args, path, offset); + } + VIR_FORCE_CLOSE(fd);
Did this patch get ACK? I'm thinking of shuffling things and pushing this prior to 5/15 (introduction of SCM_RIGHTS); patch 5 and 6 have no direct influence on one another. But I'm also annoyed at having to open qemuCaps, and am thinking about fixing that first. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

This makes root-squash NFS saves more efficient. * src/qemu/qemu_driver.c (qemudDomainSaveFlag): Use new virFileOperation flag to open fd only once. --- src/qemu/qemu_driver.c | 75 +++++++++++++++++++++++++----------------------- 1 files changed, 39 insertions(+), 36 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index c4d3c85..a73c5b9 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -1791,6 +1791,7 @@ static int qemudDomainSaveFlag(struct qemud_driver *driver, virDomainPtr dom, unsigned long long offset; virCgroupPtr cgroup = NULL; virBitmapPtr qemuCaps = NULL; + int fd = -1; memset(&header, 0, sizeof(header)); memcpy(header.magic, QEMUD_SAVE_MAGIC, sizeof(header.magic)); @@ -1869,45 +1870,32 @@ static int qemudDomainSaveFlag(struct qemud_driver *driver, virDomainPtr dom, header.xml_len += pad; } - /* Setup hook data needed by virFileOperation hook function */ - hdata.dom = dom; - hdata.path = path; - hdata.xml = xml; - hdata.header = &header; - - /* Write header to file, followed by XML */ + /* Obtain the file handle. */ /* First try creating the file as root */ if (!is_reg) { - int fd = open(path, O_WRONLY | O_TRUNC); + fd = open(path, O_WRONLY | O_TRUNC); if (fd < 0) { virReportSystemError(errno, _("unable to open %s"), path); goto endjob; } - if (qemudDomainSaveFileOpHook(fd, &hdata) < 0) { - VIR_FORCE_CLOSE(fd); - goto endjob; - } - if (VIR_CLOSE(fd) < 0) { - virReportSystemError(errno, _("unable to close %s"), path); - goto endjob; - } } else { - if ((rc = virFileOperation(path, O_CREAT|O_TRUNC|O_WRONLY, - S_IRUSR|S_IWUSR, - getuid(), getgid(), - qemudDomainSaveFileOpHook, &hdata, - 0)) < 0) { + if ((fd = virFileOperation(path, O_CREAT|O_TRUNC|O_WRONLY, + S_IRUSR|S_IWUSR, + getuid(), getgid(), + NULL, NULL, + VIR_FILE_OP_RETURN_FD)) < 0) { /* If we failed as root, and the error was permission-denied (EACCES or EPERM), assume it's on a network-connected share where root access is restricted (eg, root-squashed NFS). If the qemu user (driver->user) is non-root, just set a flag to bypass security driver shenanigans, and retry the operation after doing setuid to qemu user */ - + rc = fd; if (((rc != -EACCES) && (rc != -EPERM)) || driver->user == getuid()) { - virReportSystemError(-rc, _("Failed to create domain save file '%s'"), + virReportSystemError(-rc, + _("Failed to create domain save file '%s'"), path); goto endjob; } @@ -1932,7 +1920,7 @@ static int qemudDomainSaveFlag(struct qemud_driver *driver, virDomainPtr dom, default: /* local file - log the error returned by virFileOperation */ virReportSystemError(-rc, - _("Failed to create domain save file '%s'"), + _("Failed to create domain save file '%s'"), path); goto endjob; break; @@ -1941,13 +1929,15 @@ static int qemudDomainSaveFlag(struct qemud_driver *driver, virDomainPtr dom, /* Retry creating the file as driver->user */ - if ((rc = virFileOperation(path, O_CREAT|O_TRUNC|O_WRONLY, + if ((fd = virFileOperation(path, O_CREAT|O_TRUNC|O_WRONLY, S_IRUSR|S_IWUSR|S_IRGRP|S_IWGRP, driver->user, driver->group, - qemudDomainSaveFileOpHook, &hdata, - VIR_FILE_OP_AS_UID)) < 0) { - virReportSystemError(-rc, _("Error from child process creating '%s'"), - path); + NULL, NULL, + (VIR_FILE_OP_AS_UID | + VIR_FILE_OP_RETURN_FD))) < 0) { + virReportSystemError(-fd, + _("Error from child process creating '%s'"), + path); goto endjob; } @@ -1959,6 +1949,18 @@ static int qemudDomainSaveFlag(struct qemud_driver *driver, virDomainPtr dom, } } + /* Write header to file, followed by XML */ + hdata.dom = dom; + hdata.path = path; + hdata.xml = xml; + hdata.header = &header; + + if (qemudDomainSaveFileOpHook(fd, &hdata) < 0) { + VIR_FORCE_CLOSE(fd); + goto endjob; + } + + /* Allow qemu to access file */ if (!is_reg && qemuCgroupControllerActive(driver, VIR_CGROUP_CONTROLLER_DEVICES)) { @@ -1986,14 +1988,10 @@ static int qemudDomainSaveFlag(struct qemud_driver *driver, virDomainPtr dom, if (header.compressed == QEMUD_SAVE_FORMAT_RAW) { const char *args[] = { "cat", NULL }; - /* XXX gross - why don't we reuse the fd already opened earlier */ - int fd = -1; - if (qemuCapsGet(qemuCaps, QEMU_CAPS_MIGRATE_QEMU_FD) && - priv->monConfig->type == VIR_DOMAIN_CHR_TYPE_UNIX) - fd = open(path, O_WRONLY); qemuDomainObjEnterMonitorWithDriver(driver, vm); - if (fd >= 0 && lseek(fd, offset, SEEK_SET) == offset) { + if (qemuCapsGet(qemuCaps, QEMU_CAPS_MIGRATE_QEMU_FD) && + priv->monConfig->type == VIR_DOMAIN_CHR_TYPE_UNIX) { rc = qemuMonitorMigrateToFd(priv->mon, QEMU_MONITOR_MIGRATE_BACKGROUND, fd); @@ -2002,7 +2000,6 @@ static int qemudDomainSaveFlag(struct qemud_driver *driver, virDomainPtr dom, QEMU_MONITOR_MIGRATE_BACKGROUND, args, path, offset); } - VIR_FORCE_CLOSE(fd); qemuDomainObjExitMonitorWithDriver(driver, vm); } else { const char *prog = qemudSaveCompressionTypeToString(header.compressed); @@ -2021,6 +2018,11 @@ static int qemudDomainSaveFlag(struct qemud_driver *driver, virDomainPtr dom, if (rc < 0) goto endjob; + if (VIR_CLOSE(fd) < 0) { + virReportSystemError(errno, _("unable to close %s"), path); + goto endjob; + } + rc = qemuMigrationWaitForCompletion(driver, vm); if (rc < 0) @@ -2085,6 +2087,7 @@ endjob: cleanup: qemuCapsFree(qemuCaps); + VIR_FORCE_CLOSE(fd); VIR_FREE(xml); if (ret != 0 && is_reg) unlink(path); -- 1.7.4

On Wed, Mar 09, 2011 at 06:45:47PM -0700, Eric Blake wrote:
This makes root-squash NFS saves more efficient.
Or, indeed makes them work at all.
* src/qemu/qemu_driver.c (qemudDomainSaveFlag): Use new virFileOperation flag to open fd only once. --- src/qemu/qemu_driver.c | 75 +++++++++++++++++++++++++----------------------- 1 files changed, 39 insertions(+), 36 deletions(-)
ACK, assuming it passed NFS testing. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

* src/storage/storage_backend.c (virStorageBackendCreateRaw): Use new virFileOperation flag. --- src/storage/storage_backend.c | 48 +++++++++++++++++++++++++--------------- 1 files changed, 30 insertions(+), 18 deletions(-) diff --git a/src/storage/storage_backend.c b/src/storage/storage_backend.c index dffc73f..c7c5ccd 100644 --- a/src/storage/storage_backend.c +++ b/src/storage/storage_backend.c @@ -1,7 +1,7 @@ /* * storage_backend.c: internal storage driver backend contract * - * Copyright (C) 2007-2010 Red Hat, Inc. + * Copyright (C) 2007-2011 Red Hat, Inc. * Copyright (C) 2007-2008 Daniel P. Berrange * * This library is free software; you can redistribute it and/or @@ -344,11 +344,16 @@ virStorageBackendCreateRaw(virConnectPtr conn ATTRIBUTE_UNUSED, virStoragePoolObjPtr pool, virStorageVolDefPtr vol, virStorageVolDefPtr inputvol, - unsigned int flags ATTRIBUTE_UNUSED) + unsigned int flags) { int ret = -1; - int createstat; + int fd = -1; struct createRawFileOpHookData hdata = { vol, inputvol }; + uid_t uid; + gid_t gid; + int operation_flags; + + virCheckFlags(0, -1); if (vol->target.encryption != NULL) { virStorageReportError(VIR_ERR_NO_SUPPORT, @@ -357,24 +362,31 @@ virStorageBackendCreateRaw(virConnectPtr conn ATTRIBUTE_UNUSED, goto cleanup; } - uid_t uid = (vol->target.perms.uid == -1) ? getuid() : vol->target.perms.uid; - gid_t gid = (vol->target.perms.gid == -1) ? getgid() : vol->target.perms.gid; + uid = (vol->target.perms.uid == -1) ? getuid() : vol->target.perms.uid; + gid = (vol->target.perms.gid == -1) ? getgid() : vol->target.perms.gid; + operation_flags = VIR_FILE_OP_FORCE_PERMS | VIR_FILE_OP_RETURN_FD; + if (pool->def->type == VIR_STORAGE_POOL_NETFS) + operation_flags |= VIR_FILE_OP_AS_UID; - if ((createstat = virFileOperation(vol->target.path, - O_RDWR | O_CREAT | O_EXCL, - vol->target.perms.mode, uid, gid, - createRawFileOpHook, &hdata, - VIR_FILE_OP_FORCE_PERMS | - (pool->def->type == VIR_STORAGE_POOL_NETFS - ? VIR_FILE_OP_AS_UID : 0))) < 0) { - virReportSystemError(-createstat, - _("cannot create path '%s'"), - vol->target.path); - goto cleanup; + if ((fd = virFileOperation(vol->target.path, + O_RDWR | O_CREAT | O_EXCL, + vol->target.perms.mode, uid, gid, + NULL, NULL, operation_flags)) < 0) { + virReportSystemError(-fd, + _("cannot create path '%s'"), + vol->target.path); + goto cleanup; + } + + if ((ret = createRawFileOpHook(fd, &hdata)) < 0) { + virReportSystemError(-fd, + _("cannot create path '%s'"), + vol->target.path); + ret = -1; } - ret = 0; cleanup: + VIR_FORCE_CLOSE(fd); return ret; } -- 1.7.4

On Wed, Mar 09, 2011 at 06:45:48PM -0700, Eric Blake wrote:
* src/storage/storage_backend.c (virStorageBackendCreateRaw): Use new virFileOperation flag. --- src/storage/storage_backend.c | 48 +++++++++++++++++++++++++--------------- 1 files changed, 30 insertions(+), 18 deletions(-)
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

This patch intentionally doesn't change indentation, in order to make it easier to review the real changes. * src/util/util.h (VIR_FILE_OP_RETURN_FD, virFileOperationHook): Delete. (virFileOperation): Rename... (virFileOpenAs): ...and reduce parameters. * src/util/util.c (virFileOperationNoFork, virFileOperation): Rename and simplify. * src/qemu/qemu_driver.c (qemudDomainSaveFlag): Adjust caller. * src/storage/storage_backend.c (virStorageBackendCreateRaw): Likewise. * src/libvirt_private.syms: Reflect rename. --- src/libvirt_private.syms | 2 +- src/qemu/qemu_driver.c | 20 +++----- src/storage/storage_backend.c | 12 ++-- src/util/util.c | 103 +++++++++++++---------------------------- src/util/util.h | 15 ++---- 5 files changed, 53 insertions(+), 99 deletions(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index c0da78e..a9f7e39 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -876,8 +876,8 @@ virFileIsExecutable; virFileLinkPointsTo; virFileMakePath; virFileMatchesNameSuffix; +virFileOpenAs; virFileOpenTty; -virFileOperation; virFilePid; virFileReadAll; virFileReadLimFD; diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index a73c5b9..06bc969 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -1880,11 +1880,9 @@ static int qemudDomainSaveFlag(struct qemud_driver *driver, virDomainPtr dom, goto endjob; } } else { - if ((fd = virFileOperation(path, O_CREAT|O_TRUNC|O_WRONLY, - S_IRUSR|S_IWUSR, - getuid(), getgid(), - NULL, NULL, - VIR_FILE_OP_RETURN_FD)) < 0) { + if ((fd = virFileOpenAs(path, O_CREAT|O_TRUNC|O_WRONLY, + S_IRUSR|S_IWUSR, + getuid(), getgid(), 0)) < 0) { /* If we failed as root, and the error was permission-denied (EACCES or EPERM), assume it's on a network-connected share where root access is restricted (eg, root-squashed NFS). If the @@ -1918,7 +1916,7 @@ static int qemudDomainSaveFlag(struct qemud_driver *driver, virDomainPtr dom, case 0: default: - /* local file - log the error returned by virFileOperation */ + /* local file - log the error returned by virFileOpenAs */ virReportSystemError(-rc, _("Failed to create domain save file '%s'"), path); @@ -1929,12 +1927,10 @@ static int qemudDomainSaveFlag(struct qemud_driver *driver, virDomainPtr dom, /* Retry creating the file as driver->user */ - if ((fd = virFileOperation(path, O_CREAT|O_TRUNC|O_WRONLY, - S_IRUSR|S_IWUSR|S_IRGRP|S_IWGRP, - driver->user, driver->group, - NULL, NULL, - (VIR_FILE_OP_AS_UID | - VIR_FILE_OP_RETURN_FD))) < 0) { + if ((fd = virFileOpenAs(path, O_CREAT|O_TRUNC|O_WRONLY, + S_IRUSR|S_IWUSR|S_IRGRP|S_IWGRP, + driver->user, driver->group, + VIR_FILE_OPEN_AS_UID)) < 0) { virReportSystemError(-fd, _("Error from child process creating '%s'"), path); diff --git a/src/storage/storage_backend.c b/src/storage/storage_backend.c index c7c5ccd..7a3a2b8 100644 --- a/src/storage/storage_backend.c +++ b/src/storage/storage_backend.c @@ -364,14 +364,14 @@ virStorageBackendCreateRaw(virConnectPtr conn ATTRIBUTE_UNUSED, uid = (vol->target.perms.uid == -1) ? getuid() : vol->target.perms.uid; gid = (vol->target.perms.gid == -1) ? getgid() : vol->target.perms.gid; - operation_flags = VIR_FILE_OP_FORCE_PERMS | VIR_FILE_OP_RETURN_FD; + operation_flags = VIR_FILE_OPEN_FORCE_PERMS; if (pool->def->type == VIR_STORAGE_POOL_NETFS) - operation_flags |= VIR_FILE_OP_AS_UID; + operation_flags |= VIR_FILE_OPEN_AS_UID; - if ((fd = virFileOperation(vol->target.path, - O_RDWR | O_CREAT | O_EXCL, - vol->target.perms.mode, uid, gid, - NULL, NULL, operation_flags)) < 0) { + if ((fd = virFileOpenAs(vol->target.path, + O_RDWR | O_CREAT | O_EXCL, + vol->target.perms.mode, uid, gid, + operation_flags)) < 0) { virReportSystemError(-fd, _("cannot create path '%s'"), vol->target.path); diff --git a/src/util/util.c b/src/util/util.c index 6bf3219..137cdb9 100644 --- a/src/util/util.c +++ b/src/util/util.c @@ -1355,10 +1355,10 @@ virFileIsExecutable(const char *file) #ifndef WIN32 /* return -errno on failure, or 0 on success */ -static int virFileOperationNoFork(const char *path, int openflags, mode_t mode, - uid_t uid, gid_t gid, - virFileOperationHook hook, void *hookdata, - unsigned int flags) { +static int +virFileOpenAsNoFork(const char *path, int openflags, mode_t mode, + uid_t uid, gid_t gid, unsigned int flags) +{ int fd = -1; int ret = 0; struct stat st; @@ -1381,7 +1381,7 @@ static int virFileOperationNoFork(const char *path, int openflags, mode_t mode, path, (unsigned int) uid, (unsigned int) gid); goto error; } - if ((flags & VIR_FILE_OP_FORCE_PERMS) + if ((flags & VIR_FILE_OPEN_FORCE_PERMS) && (fchmod(fd, mode) < 0)) { ret = -errno; virReportSystemError(errno, @@ -1389,17 +1389,7 @@ static int virFileOperationNoFork(const char *path, int openflags, mode_t mode, path, mode); goto error; } - if ((hook) && ((ret = hook(fd, hookdata)) != 0)) { - goto error; - } - if (flags & VIR_FILE_OP_RETURN_FD) - return fd; - if (VIR_CLOSE(fd) < 0) { - ret = -errno; - virReportSystemError(errno, _("failed to close new file '%s'"), - path); - goto error; - } + return fd; error: VIR_FORCE_CLOSE(fd); @@ -1446,35 +1436,27 @@ error: } /** - * virFileOperation: + * virFileOpenAs: * @path: file to open or create * @openflags: flags to pass to open * @mode: mode to use on creation or when forcing permissions * @uid: uid that should own file on creation * @gid: gid that should own file - * @hook: callback to run once file is open; see below for restrictions - * @hookdata: opaque data for @hook - * @flags: bit-wise or of VIR_FILE_OP_* flags + * @flags: bit-wise or of VIR_FILE_OPEN_* flags * * Open @path, and execute an optional callback @hook on the open file * description. @hook must return 0 on success, or -errno on failure. - * If @flags includes VIR_FILE_OP_AS_UID, then open the file while the + * If @flags includes VIR_FILE_OPEN_AS_UID, then open the file while the * effective user id is @uid (by using a child process); this allows * one to bypass root-squashing NFS issues. If @flags includes - * VIR_FILE_OP_FORCE_PERMS, then ensure that @path has those + * VIR_FILE_OPEN_FORCE_PERMS, then ensure that @path has those * permissions before the callback, even if the file already existed - * with different permissions. If @flags includes - * VIR_FILE_OP_RETURN_FD, then use SCM_RIGHTS to guarantee that any - * @hook is run in the parent, and the return value (if non-negative) - * is the file descriptor, left open. Otherwise, @hook might be run - * in a child process, so it must be async-signal-safe (ie. no - * malloc() or anything else that depends on modifying locks held in - * the parent), no file descriptor remains open, and 0 is returned on - * success. Return -errno on failure. */ -int virFileOperation(const char *path, int openflags, mode_t mode, - uid_t uid, gid_t gid, - virFileOperationHook hook, void *hookdata, - unsigned int flags) { + * with different permissions. The return value (if non-negative) + * is the file descriptor, left open. Return -errno on failure. */ +int +virFileOpenAs(const char *path, int openflags, mode_t mode, + uid_t uid, gid_t gid, unsigned int flags) +{ struct stat st; pid_t pid; int waitret, status, ret = 0; @@ -1487,11 +1469,10 @@ int virFileOperation(const char *path, int openflags, mode_t mode, struct iovec iov; int forkRet; - if ((!(flags & VIR_FILE_OP_AS_UID)) + if ((!(flags & VIR_FILE_OPEN_AS_UID)) || (getuid() != 0) || ((uid == 0) && (gid == 0))) { - return virFileOperationNoFork(path, openflags, mode, uid, gid, - hook, hookdata, flags); + return virFileOpenAsNoFork(path, openflags, mode, uid, gid, flags); } /* parent is running as root, but caller requested that the @@ -1499,7 +1480,7 @@ int virFileOperation(const char *path, int openflags, mode_t mode, * following dance avoids problems caused by root-squashing * NFS servers. */ - if (flags & VIR_FILE_OP_RETURN_FD) { + { if (socketpair(AF_UNIX, SOCK_DGRAM, 0, pair) < 0) { ret = -errno; virReportSystemError(errno, @@ -1529,7 +1510,7 @@ int virFileOperation(const char *path, int openflags, mode_t mode, } if (pid) { /* parent */ - if (flags & VIR_FILE_OP_RETURN_FD) { + { VIR_FORCE_CLOSE(pair[1]); do { @@ -1565,20 +1546,13 @@ int virFileOperation(const char *path, int openflags, mode_t mode, goto parenterror; } ret = -WEXITSTATUS(status); - if (!WIFEXITED(status) || (ret == -EACCES) || - ((flags & VIR_FILE_OP_RETURN_FD) && fd == -1)) { + if (!WIFEXITED(status) || (ret == -EACCES) || fd == -1) { /* fall back to the simpler method, which works better in * some cases */ - return virFileOperationNoFork(path, openflags, mode, uid, gid, - hook, hookdata, flags); + return virFileOpenAsNoFork(path, openflags, mode, uid, gid, flags); } - if (!ret && (flags & VIR_FILE_OP_RETURN_FD)) { - if ((hook) && ((ret = hook(fd, hookdata)) != 0)) { - VIR_FORCE_CLOSE(fd); - goto parenterror; - } + if (!ret) ret = fd; - } parenterror: return ret; } @@ -1630,7 +1604,7 @@ parenterror: path, (unsigned int) uid, (unsigned int) gid); goto childerror; } - if ((flags & VIR_FILE_OP_FORCE_PERMS) + if ((flags & VIR_FILE_OPEN_FORCE_PERMS) && (fchmod(fd, mode) < 0)) { ret = -errno; virReportSystemError(errno, @@ -1638,7 +1612,7 @@ parenterror: path, mode); goto childerror; } - if (flags & VIR_FILE_OP_RETURN_FD) { + { VIR_FORCE_CLOSE(pair[0]); memcpy(CMSG_DATA(cmsg), &fd, sizeof(fd)); @@ -1651,17 +1625,6 @@ parenterror: VIR_FORCE_CLOSE(pair[1]); goto childerror; } - ret = 0; - } else { - if ((hook) && ((ret = hook(fd, hookdata)) != 0)) - goto childerror; - if (VIR_CLOSE(fd) < 0) { - ret = -errno; - virReportSystemError(errno, - _("child failed to close new file '%s'"), - path); - goto childerror; - } } ret = 0; @@ -1783,17 +1746,15 @@ childerror: #else /* WIN32 */ /* return -errno on failure, or 0 on success */ -int virFileOperation(const char *path ATTRIBUTE_UNUSED, - int openflags ATTRIBUTE_UNUSED, - mode_t mode ATTRIBUTE_UNUSED, - uid_t uid ATTRIBUTE_UNUSED, - gid_t gid ATTRIBUTE_UNUSED, - virFileOperationHook hook ATTRIBUTE_UNUSED, - void *hookdata ATTRIBUTE_UNUSED, - unsigned int flags ATTRIBUTE_UNUSED) +int virFileOpenAs(const char *path ATTRIBUTE_UNUSED, + int openflags ATTRIBUTE_UNUSED, + mode_t mode ATTRIBUTE_UNUSED, + uid_t uid ATTRIBUTE_UNUSED, + gid_t gid ATTRIBUTE_UNUSED, + unsigned int flags ATTRIBUTE_UNUSED) { virUtilError(VIR_ERR_INTERNAL_ERROR, - "%s", _("virFileOperation is not implemented for WIN32")); + "%s", _("virFileOpenAs is not implemented for WIN32")); return -ENOSYS; } diff --git a/src/util/util.h b/src/util/util.h index 3970d58..cecd348 100644 --- a/src/util/util.h +++ b/src/util/util.h @@ -125,16 +125,13 @@ bool virFileIsExecutable(const char *file) ATTRIBUTE_NONNULL(1); char *virFileSanitizePath(const char *path); enum { - VIR_FILE_OP_NONE = 0, - VIR_FILE_OP_AS_UID = (1 << 0), - VIR_FILE_OP_FORCE_PERMS = (1 << 1), - VIR_FILE_OP_RETURN_FD = (1 << 2), + VIR_FILE_OPEN_NONE = 0, + VIR_FILE_OPEN_AS_UID = (1 << 0), + VIR_FILE_OPEN_FORCE_PERMS = (1 << 1), }; -typedef int (*virFileOperationHook)(int fd, void *data); -int virFileOperation(const char *path, int openflags, mode_t mode, - uid_t uid, gid_t gid, - virFileOperationHook hook, void *hookdata, - unsigned int flags) +int virFileOpenAs(const char *path, int openflags, mode_t mode, + uid_t uid, gid_t gid, + unsigned int flags) ATTRIBUTE_NONNULL(1) ATTRIBUTE_RETURN_CHECK; enum { -- 1.7.4

On Wed, Mar 09, 2011 at 06:45:49PM -0700, Eric Blake wrote:
This patch intentionally doesn't change indentation, in order to make it easier to review the real changes.
More importantly it removes the hook function now.
* src/util/util.h (VIR_FILE_OP_RETURN_FD, virFileOperationHook): Delete. (virFileOperation): Rename... (virFileOpenAs): ...and reduce parameters. * src/util/util.c (virFileOperationNoFork, virFileOperation): Rename and simplify. * src/qemu/qemu_driver.c (qemudDomainSaveFlag): Adjust caller. * src/storage/storage_backend.c (virStorageBackendCreateRaw): Likewise. * src/libvirt_private.syms: Reflect rename. --- src/libvirt_private.syms | 2 +- src/qemu/qemu_driver.c | 20 +++----- src/storage/storage_backend.c | 12 ++-- src/util/util.c | 103 +++++++++++++---------------------------- src/util/util.h | 15 ++---- 5 files changed, 53 insertions(+), 99 deletions(-)
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 03/09/2011 08:45 PM, Eric Blake wrote:
This patch intentionally doesn't change indentation, in order to make it easier to review the real changes.
* src/util/util.h (VIR_FILE_OP_RETURN_FD, virFileOperationHook): Delete. (virFileOperation): Rename... (virFileOpenAs): ...and reduce parameters. * src/util/util.c (virFileOperationNoFork, virFileOperation): Rename and simplify. * src/qemu/qemu_driver.c (qemudDomainSaveFlag): Adjust caller. * src/storage/storage_backend.c (virStorageBackendCreateRaw): Likewise. * src/libvirt_private.syms: Reflect rename. --- src/libvirt_private.syms | 2 +- src/qemu/qemu_driver.c | 20 +++----- src/storage/storage_backend.c | 12 ++-- src/util/util.c | 103 +++++++++++++---------------------------- src/util/util.h | 15 ++---- 5 files changed, 53 insertions(+), 99 deletions(-)
diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index c0da78e..a9f7e39 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -876,8 +876,8 @@ virFileIsExecutable; virFileLinkPointsTo; virFileMakePath; virFileMatchesNameSuffix; +virFileOpenAs; virFileOpenTty; -virFileOperation; virFilePid; virFileReadAll; virFileReadLimFD; diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index a73c5b9..06bc969 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -1880,11 +1880,9 @@ static int qemudDomainSaveFlag(struct qemud_driver *driver, virDomainPtr dom, goto endjob; } } else { - if ((fd = virFileOperation(path, O_CREAT|O_TRUNC|O_WRONLY, - S_IRUSR|S_IWUSR, - getuid(), getgid(), - NULL, NULL, - VIR_FILE_OP_RETURN_FD))< 0) { + if ((fd = virFileOpenAs(path, O_CREAT|O_TRUNC|O_WRONLY, + S_IRUSR|S_IWUSR, + getuid(), getgid(), 0))< 0) { /* If we failed as root, and the error was permission-denied (EACCES or EPERM), assume it's on a network-connected share where root access is restricted (eg, root-squashed NFS). If the @@ -1918,7 +1916,7 @@ static int qemudDomainSaveFlag(struct qemud_driver *driver, virDomainPtr dom,
case 0: default: - /* local file - log the error returned by virFileOperation */ + /* local file - log the error returned by virFileOpenAs */ virReportSystemError(-rc, _("Failed to create domain save file '%s'"), path); @@ -1929,12 +1927,10 @@ static int qemudDomainSaveFlag(struct qemud_driver *driver, virDomainPtr dom,
/* Retry creating the file as driver->user */
- if ((fd = virFileOperation(path, O_CREAT|O_TRUNC|O_WRONLY, - S_IRUSR|S_IWUSR|S_IRGRP|S_IWGRP, - driver->user, driver->group, - NULL, NULL, - (VIR_FILE_OP_AS_UID | - VIR_FILE_OP_RETURN_FD)))< 0) { + if ((fd = virFileOpenAs(path, O_CREAT|O_TRUNC|O_WRONLY, + S_IRUSR|S_IWUSR|S_IRGRP|S_IWGRP, + driver->user, driver->group, + VIR_FILE_OPEN_AS_UID))< 0) { virReportSystemError(-fd, _("Error from child process creating '%s'"), path); diff --git a/src/storage/storage_backend.c b/src/storage/storage_backend.c index c7c5ccd..7a3a2b8 100644 --- a/src/storage/storage_backend.c +++ b/src/storage/storage_backend.c @@ -364,14 +364,14 @@ virStorageBackendCreateRaw(virConnectPtr conn ATTRIBUTE_UNUSED,
uid = (vol->target.perms.uid == -1) ? getuid() : vol->target.perms.uid; gid = (vol->target.perms.gid == -1) ? getgid() : vol->target.perms.gid; - operation_flags = VIR_FILE_OP_FORCE_PERMS | VIR_FILE_OP_RETURN_FD; + operation_flags = VIR_FILE_OPEN_FORCE_PERMS; if (pool->def->type == VIR_STORAGE_POOL_NETFS) - operation_flags |= VIR_FILE_OP_AS_UID; + operation_flags |= VIR_FILE_OPEN_AS_UID;
- if ((fd = virFileOperation(vol->target.path, - O_RDWR | O_CREAT | O_EXCL, - vol->target.perms.mode, uid, gid, - NULL, NULL, operation_flags))< 0) { + if ((fd = virFileOpenAs(vol->target.path, + O_RDWR | O_CREAT | O_EXCL, + vol->target.perms.mode, uid, gid, + operation_flags))< 0) { virReportSystemError(-fd, _("cannot create path '%s'"), vol->target.path); diff --git a/src/util/util.c b/src/util/util.c index 6bf3219..137cdb9 100644 --- a/src/util/util.c +++ b/src/util/util.c @@ -1355,10 +1355,10 @@ virFileIsExecutable(const char *file)
#ifndef WIN32 /* return -errno on failure, or 0 on success */ -static int virFileOperationNoFork(const char *path, int openflags, mode_t mode, - uid_t uid, gid_t gid, - virFileOperationHook hook, void *hookdata, - unsigned int flags) { +static int +virFileOpenAsNoFork(const char *path, int openflags, mode_t mode, + uid_t uid, gid_t gid, unsigned int flags) +{ int fd = -1; int ret = 0; struct stat st; @@ -1381,7 +1381,7 @@ static int virFileOperationNoFork(const char *path, int openflags, mode_t mode, path, (unsigned int) uid, (unsigned int) gid); goto error; } - if ((flags& VIR_FILE_OP_FORCE_PERMS) + if ((flags& VIR_FILE_OPEN_FORCE_PERMS) && (fchmod(fd, mode)< 0)) { ret = -errno; virReportSystemError(errno, @@ -1389,17 +1389,7 @@ static int virFileOperationNoFork(const char *path, int openflags, mode_t mode, path, mode); goto error; } - if ((hook)&& ((ret = hook(fd, hookdata)) != 0)) { - goto error; - } - if (flags& VIR_FILE_OP_RETURN_FD) - return fd; - if (VIR_CLOSE(fd)< 0) { - ret = -errno; - virReportSystemError(errno, _("failed to close new file '%s'"), - path); - goto error; - } + return fd;
error: VIR_FORCE_CLOSE(fd); @@ -1446,35 +1436,27 @@ error: }
/** - * virFileOperation: + * virFileOpenAs: * @path: file to open or create * @openflags: flags to pass to open * @mode: mode to use on creation or when forcing permissions * @uid: uid that should own file on creation * @gid: gid that should own file - * @hook: callback to run once file is open; see below for restrictions - * @hookdata: opaque data for @hook - * @flags: bit-wise or of VIR_FILE_OP_* flags + * @flags: bit-wise or of VIR_FILE_OPEN_* flags * * Open @path, and execute an optional callback @hook on the open file * description. @hook must return 0 on success, or -errno on failure. - * If @flags includes VIR_FILE_OP_AS_UID, then open the file while the + * If @flags includes VIR_FILE_OPEN_AS_UID, then open the file while the * effective user id is @uid (by using a child process); this allows * one to bypass root-squashing NFS issues. If @flags includes - * VIR_FILE_OP_FORCE_PERMS, then ensure that @path has those + * VIR_FILE_OPEN_FORCE_PERMS, then ensure that @path has those * permissions before the callback, even if the file already existed - * with different permissions. If @flags includes - * VIR_FILE_OP_RETURN_FD, then use SCM_RIGHTS to guarantee that any - * @hook is run in the parent, and the return value (if non-negative) - * is the file descriptor, left open. Otherwise, @hook might be run - * in a child process, so it must be async-signal-safe (ie. no - * malloc() or anything else that depends on modifying locks held in - * the parent), no file descriptor remains open, and 0 is returned on - * success. Return -errno on failure. */ -int virFileOperation(const char *path, int openflags, mode_t mode, - uid_t uid, gid_t gid, - virFileOperationHook hook, void *hookdata, - unsigned int flags) { + * with different permissions. The return value (if non-negative) + * is the file descriptor, left open. Return -errno on failure. */ +int +virFileOpenAs(const char *path, int openflags, mode_t mode, + uid_t uid, gid_t gid, unsigned int flags) +{ struct stat st; pid_t pid; int waitret, status, ret = 0; @@ -1487,11 +1469,10 @@ int virFileOperation(const char *path, int openflags, mode_t mode, struct iovec iov; int forkRet;
- if ((!(flags& VIR_FILE_OP_AS_UID)) + if ((!(flags& VIR_FILE_OPEN_AS_UID)) || (getuid() != 0) || ((uid == 0)&& (gid == 0))) { - return virFileOperationNoFork(path, openflags, mode, uid, gid, - hook, hookdata, flags); + return virFileOpenAsNoFork(path, openflags, mode, uid, gid, flags); }
/* parent is running as root, but caller requested that the @@ -1499,7 +1480,7 @@ int virFileOperation(const char *path, int openflags, mode_t mode, * following dance avoids problems caused by root-squashing * NFS servers. */
- if (flags& VIR_FILE_OP_RETURN_FD) { + { if (socketpair(AF_UNIX, SOCK_DGRAM, 0, pair)< 0) { ret = -errno; virReportSystemError(errno, @@ -1529,7 +1510,7 @@ int virFileOperation(const char *path, int openflags, mode_t mode, }
if (pid) { /* parent */ - if (flags& VIR_FILE_OP_RETURN_FD) { + { VIR_FORCE_CLOSE(pair[1]);
do { @@ -1565,20 +1546,13 @@ int virFileOperation(const char *path, int openflags, mode_t mode, goto parenterror; } ret = -WEXITSTATUS(status); - if (!WIFEXITED(status) || (ret == -EACCES) || - ((flags& VIR_FILE_OP_RETURN_FD)&& fd == -1)) { + if (!WIFEXITED(status) || (ret == -EACCES) || fd == -1) { /* fall back to the simpler method, which works better in * some cases */ - return virFileOperationNoFork(path, openflags, mode, uid, gid, - hook, hookdata, flags); + return virFileOpenAsNoFork(path, openflags, mode, uid, gid, flags); } - if (!ret&& (flags& VIR_FILE_OP_RETURN_FD)) { - if ((hook)&& ((ret = hook(fd, hookdata)) != 0)) { - VIR_FORCE_CLOSE(fd); - goto parenterror; - } + if (!ret) ret = fd; - } parenterror: return ret; } @@ -1630,7 +1604,7 @@ parenterror: path, (unsigned int) uid, (unsigned int) gid); goto childerror; } - if ((flags& VIR_FILE_OP_FORCE_PERMS) + if ((flags& VIR_FILE_OPEN_FORCE_PERMS) && (fchmod(fd, mode)< 0)) { ret = -errno; virReportSystemError(errno, @@ -1638,7 +1612,7 @@ parenterror: path, mode); goto childerror; } - if (flags& VIR_FILE_OP_RETURN_FD) { + { VIR_FORCE_CLOSE(pair[0]);
This VIR_FORCE_CLOSE(pair[0]) should be moved up to the very start of the child code - otherwise any error prior to this point will leave it dangling for _exit() to clean up.
memcpy(CMSG_DATA(cmsg),&fd, sizeof(fd));
@@ -1651,17 +1625,6 @@ parenterror: VIR_FORCE_CLOSE(pair[1]);
A leftover from 05/15 - I think thie VIR_FORCE_CLOSE should happen at/after childerror:. That way it will always get cleaned up, whether or not there is an error at any time during child execution.
goto childerror; } - ret = 0; - } else { - if ((hook)&& ((ret = hook(fd, hookdata)) != 0)) - goto childerror; - if (VIR_CLOSE(fd)< 0) { - ret = -errno; - virReportSystemError(errno, - _("child failed to close new file '%s'"), - path); - goto childerror; - } }
ret = 0; @@ -1783,17 +1746,15 @@ childerror: #else /* WIN32 */
/* return -errno on failure, or 0 on success */ -int virFileOperation(const char *path ATTRIBUTE_UNUSED, - int openflags ATTRIBUTE_UNUSED, - mode_t mode ATTRIBUTE_UNUSED, - uid_t uid ATTRIBUTE_UNUSED, - gid_t gid ATTRIBUTE_UNUSED, - virFileOperationHook hook ATTRIBUTE_UNUSED, - void *hookdata ATTRIBUTE_UNUSED, - unsigned int flags ATTRIBUTE_UNUSED) +int virFileOpenAs(const char *path ATTRIBUTE_UNUSED, + int openflags ATTRIBUTE_UNUSED, + mode_t mode ATTRIBUTE_UNUSED, + uid_t uid ATTRIBUTE_UNUSED, + gid_t gid ATTRIBUTE_UNUSED, + unsigned int flags ATTRIBUTE_UNUSED) { virUtilError(VIR_ERR_INTERNAL_ERROR, - "%s", _("virFileOperation is not implemented for WIN32")); + "%s", _("virFileOpenAs is not implemented for WIN32"));
return -ENOSYS; } diff --git a/src/util/util.h b/src/util/util.h index 3970d58..cecd348 100644 --- a/src/util/util.h +++ b/src/util/util.h @@ -125,16 +125,13 @@ bool virFileIsExecutable(const char *file) ATTRIBUTE_NONNULL(1); char *virFileSanitizePath(const char *path);
enum { - VIR_FILE_OP_NONE = 0, - VIR_FILE_OP_AS_UID = (1<< 0), - VIR_FILE_OP_FORCE_PERMS = (1<< 1), - VIR_FILE_OP_RETURN_FD = (1<< 2), + VIR_FILE_OPEN_NONE = 0, + VIR_FILE_OPEN_AS_UID = (1<< 0), + VIR_FILE_OPEN_FORCE_PERMS = (1<< 1), }; -typedef int (*virFileOperationHook)(int fd, void *data); -int virFileOperation(const char *path, int openflags, mode_t mode, - uid_t uid, gid_t gid, - virFileOperationHook hook, void *hookdata, - unsigned int flags) +int virFileOpenAs(const char *path, int openflags, mode_t mode, + uid_t uid, gid_t gid, + unsigned int flags) ATTRIBUTE_NONNULL(1) ATTRIBUTE_RETURN_CHECK;
enum {
Aside from the two problems with the locations of VIR_FORCE_CLOSE(pair[n]), this all looks fine, and hasn't broken either backing store or save images on root-squash NFS, so ACK with those two changes.

On 03/11/2011 08:39 AM, Laine Stump wrote:
path, mode); goto childerror; } - if (flags& VIR_FILE_OP_RETURN_FD) { + { VIR_FORCE_CLOSE(pair[0]);
This VIR_FORCE_CLOSE(pair[0]) should be moved up to the very start of the child code - otherwise any error prior to this point will leave it dangling for _exit() to clean up.
Well, _exit() is guaranteed to close fds, so I was being a bit lazy in the child :) But I'll go ahead and rearrange this. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

Separating the indentation from the real patch made review easier. * src/util/util.c (virFileOpenAs): Whitespace changes. --- src/util/util.c | 101 +++++++++++++++++++++++++----------------------------- 1 files changed, 47 insertions(+), 54 deletions(-) diff --git a/src/util/util.c b/src/util/util.c index 137cdb9..8d00ded 100644 --- a/src/util/util.c +++ b/src/util/util.c @@ -1403,8 +1403,7 @@ static int virDirCreateNoFork(const char *path, mode_t mode, uid_t uid, gid_t gi struct stat st; if ((mkdir(path, mode) < 0) - && !((errno == EEXIST) && (flags & VIR_DIR_CREATE_ALLOW_EXIST))) - { + && !((errno == EEXIST) && (flags & VIR_DIR_CREATE_ALLOW_EXIST))) { ret = -errno; virReportSystemError(errno, _("failed to create directory '%s'"), path); @@ -1480,28 +1479,26 @@ virFileOpenAs(const char *path, int openflags, mode_t mode, * following dance avoids problems caused by root-squashing * NFS servers. */ - { - if (socketpair(AF_UNIX, SOCK_DGRAM, 0, pair) < 0) { - ret = -errno; - virReportSystemError(errno, - _("failed to create socket needed for '%s'"), - path); - return ret; - } - - memset(&msg, 0, sizeof(msg)); - iov.iov_base = &dummy; - iov.iov_len = 1; - msg.msg_iov = &iov; - msg.msg_iovlen = 1; - msg.msg_control = buf; - msg.msg_controllen = sizeof(buf); - cmsg = CMSG_FIRSTHDR(&msg); - cmsg->cmsg_level = SOL_SOCKET; - cmsg->cmsg_type = SCM_RIGHTS; - cmsg->cmsg_len = CMSG_LEN(sizeof(fd)); + if (socketpair(AF_UNIX, SOCK_DGRAM, 0, pair) < 0) { + ret = -errno; + virReportSystemError(errno, + _("failed to create socket needed for '%s'"), + path); + return ret; } + memset(&msg, 0, sizeof(msg)); + iov.iov_base = &dummy; + iov.iov_len = 1; + msg.msg_iov = &iov; + msg.msg_iovlen = 1; + msg.msg_control = buf; + msg.msg_controllen = sizeof(buf); + cmsg = CMSG_FIRSTHDR(&msg); + cmsg->cmsg_level = SOL_SOCKET; + cmsg->cmsg_type = SCM_RIGHTS; + cmsg->cmsg_len = CMSG_LEN(sizeof(fd)); + forkRet = virFork(&pid); if (pid < 0) { @@ -1510,29 +1507,27 @@ virFileOpenAs(const char *path, int openflags, mode_t mode, } if (pid) { /* parent */ - { - VIR_FORCE_CLOSE(pair[1]); - - do { - ret = recvmsg(pair[0], &msg, 0); - } while (ret < 0 && errno == EINTR); - - if (ret < 0) { - ret = -errno; - VIR_FORCE_CLOSE(pair[0]); - while ((waitret = waitpid(pid, NULL, 0) == -1) - && (errno == EINTR)); - goto parenterror; - } + VIR_FORCE_CLOSE(pair[1]); + + do { + ret = recvmsg(pair[0], &msg, 0); + } while (ret < 0 && errno == EINTR); + + if (ret < 0) { + ret = -errno; VIR_FORCE_CLOSE(pair[0]); + while ((waitret = waitpid(pid, NULL, 0) == -1) + && (errno == EINTR)); + goto parenterror; + } + VIR_FORCE_CLOSE(pair[0]); - /* See if fd was transferred. */ - cmsg = CMSG_FIRSTHDR(&msg); - if (cmsg && cmsg->cmsg_len == CMSG_LEN(sizeof(fd)) && - cmsg->cmsg_level == SOL_SOCKET && - cmsg->cmsg_type == SCM_RIGHTS) { - memcpy(&fd, CMSG_DATA(cmsg), sizeof(fd)); - } + /* See if fd was transferred. */ + cmsg = CMSG_FIRSTHDR(&msg); + if (cmsg && cmsg->cmsg_len == CMSG_LEN(sizeof(fd)) && + cmsg->cmsg_level == SOL_SOCKET && + cmsg->cmsg_type == SCM_RIGHTS) { + memcpy(&fd, CMSG_DATA(cmsg), sizeof(fd)); } /* wait for child to complete, and retrieve its exit code */ @@ -1612,19 +1607,17 @@ parenterror: path, mode); goto childerror; } - { - VIR_FORCE_CLOSE(pair[0]); - memcpy(CMSG_DATA(cmsg), &fd, sizeof(fd)); + VIR_FORCE_CLOSE(pair[0]); + memcpy(CMSG_DATA(cmsg), &fd, sizeof(fd)); - do { - ret = sendmsg(pair[1], &msg, 0); - } while (ret < 0 && errno == EINTR); + do { + ret = sendmsg(pair[1], &msg, 0); + } while (ret < 0 && errno == EINTR); - if (ret < 0) { - ret = -errno; - VIR_FORCE_CLOSE(pair[1]); - goto childerror; - } + if (ret < 0) { + ret = -errno; + VIR_FORCE_CLOSE(pair[1]); + goto childerror; } ret = 0; -- 1.7.4

On Wed, Mar 09, 2011 at 06:45:50PM -0700, Eric Blake wrote:
Separating the indentation from the real patch made review easier.
* src/util/util.c (virFileOpenAs): Whitespace changes. --- src/util/util.c | 101 +++++++++++++++++++++++++----------------------------- 1 files changed, 47 insertions(+), 54 deletions(-)
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

* src/storage/storage_backend.c (createRawFileOpHook): Change signature. (struct createRawFileOpHookData): Delete unused struct. (virStorageBackendCreateRaw): Adjust caller. * src/qemu/qemu_driver.c (struct fileOpHookData): Delete unused struct. (qemudDomainSaveFileOpHook): Rename... (qemuDomainSaveFileOpHook): ...and change signature. (qemudDomainSaveFlag): Adjust caller. --- Phooey - Just noticed that my ISP griped that git send-email sent too many mails in one session (is there a setting to tweak to force multiple sessions?) Hopefully I rethreaded this correctly. src/qemu/qemu_driver.c | 29 +++++++++-------------------- src/storage/storage_backend.c | 34 ++++++++++++++-------------------- 2 files changed, 23 insertions(+), 40 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 06bc969..11ba910 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -1741,30 +1741,25 @@ struct qemud_save_header { int unused[15]; }; -struct fileOpHookData { - virDomainPtr dom; - const char *path; - char *xml; - struct qemud_save_header *header; -}; - /* return -errno on failure, or 0 on success */ -static int qemudDomainSaveFileOpHook(int fd, void *data) { - struct fileOpHookData *hdata = data; +static int +qemuDomainSaveHeader(int fd, const char *path, char *xml, + struct qemud_save_header *header) +{ int ret = 0; - if (safewrite(fd, hdata->header, sizeof(*hdata->header)) != sizeof(*hdata->header)) { + if (safewrite(fd, header, sizeof(*header)) != sizeof(*header)) { ret = -errno; qemuReportError(VIR_ERR_OPERATION_FAILED, _("failed to write header to domain save file '%s'"), - hdata->path); + path); goto endjob; } - if (safewrite(fd, hdata->xml, hdata->header->xml_len) != hdata->header->xml_len) { + if (safewrite(fd, xml, header->xml_len) != header->xml_len) { ret = -errno; qemuReportError(VIR_ERR_OPERATION_FAILED, - _("failed to write xml to '%s'"), hdata->path); + _("failed to write xml to '%s'"), path); goto endjob; } endjob: @@ -1780,7 +1775,6 @@ static int qemudDomainSaveFlag(struct qemud_driver *driver, virDomainPtr dom, { char *xml = NULL; struct qemud_save_header header; - struct fileOpHookData hdata; int bypassSecurityDriver = 0; int ret = -1; int rc; @@ -1946,12 +1940,7 @@ static int qemudDomainSaveFlag(struct qemud_driver *driver, virDomainPtr dom, } /* Write header to file, followed by XML */ - hdata.dom = dom; - hdata.path = path; - hdata.xml = xml; - hdata.header = &header; - - if (qemudDomainSaveFileOpHook(fd, &hdata) < 0) { + if (qemuDomainSaveHeader(fd, path, xml, &header) < 0) { VIR_FORCE_CLOSE(fd); goto endjob; } diff --git a/src/storage/storage_backend.c b/src/storage/storage_backend.c index 7a3a2b8..be980b0 100644 --- a/src/storage/storage_backend.c +++ b/src/storage/storage_backend.c @@ -266,31 +266,27 @@ cleanup: return ret; } -struct createRawFileOpHookData { - virStorageVolDefPtr vol; - virStorageVolDefPtr inputvol; -}; - -static int createRawFileOpHook(int fd, void *data) { - struct createRawFileOpHookData *hdata = data; +static int +createRawFile(int fd, virStorageVolDefPtr vol, + virStorageVolDefPtr inputvol) +{ int ret = 0; unsigned long long remain; /* Seek to the final size, so the capacity is available upfront * for progress reporting */ - if (ftruncate(fd, hdata->vol->capacity) < 0) { + if (ftruncate(fd, vol->capacity) < 0) { ret = -errno; virReportSystemError(errno, _("cannot extend file '%s'"), - hdata->vol->target.path); + vol->target.path); goto cleanup; } - remain = hdata->vol->allocation; + remain = vol->allocation; - if (hdata->inputvol) { - ret = virStorageBackendCopyToFD(hdata->vol, hdata->inputvol, - fd, &remain, 1); + if (inputvol) { + ret = virStorageBackendCopyToFD(vol, inputvol, fd, &remain, 1); if (ret < 0) { goto cleanup; } @@ -308,11 +304,10 @@ static int createRawFileOpHook(int fd, void *data) { if (bytes > remain) bytes = remain; - if (safezero(fd, 0, hdata->vol->allocation - remain, - bytes) != 0) { + if (safezero(fd, 0, vol->allocation - remain, bytes) != 0) { ret = -errno; virReportSystemError(errno, _("cannot fill file '%s'"), - hdata->vol->target.path); + vol->target.path); goto cleanup; } remain -= bytes; @@ -321,7 +316,7 @@ static int createRawFileOpHook(int fd, void *data) { if (safezero(fd, 0, 0, remain) != 0) { ret = -errno; virReportSystemError(errno, _("cannot fill file '%s'"), - hdata->vol->target.path); + vol->target.path); goto cleanup; } } @@ -331,7 +326,7 @@ static int createRawFileOpHook(int fd, void *data) { if (fsync(fd) < 0) { ret = -errno; virReportSystemError(errno, _("cannot sync data to file '%s'"), - hdata->vol->target.path); + vol->target.path); goto cleanup; } @@ -348,7 +343,6 @@ virStorageBackendCreateRaw(virConnectPtr conn ATTRIBUTE_UNUSED, { int ret = -1; int fd = -1; - struct createRawFileOpHookData hdata = { vol, inputvol }; uid_t uid; gid_t gid; int operation_flags; @@ -378,7 +372,7 @@ virStorageBackendCreateRaw(virConnectPtr conn ATTRIBUTE_UNUSED, goto cleanup; } - if ((ret = createRawFileOpHook(fd, &hdata)) < 0) { + if ((ret = createRawFile(fd, vol, inputvol)) < 0) { virReportSystemError(-fd, _("cannot create path '%s'"), vol->target.path); -- 1.7.4

Direct access to an open file is so much simpler than passing everything through a pipe! * src/qemu/qemu_driver.c (qemudOpenAsUID) (qemudDomainSaveImageClose): Delete. (qemudDomainSaveImageOpen): Rename... (qemuDomainSaveImageOpen): ...and drop read_pid argument. Use virFileOpenAs instead of qemudOpenAsUID. (qemudDomainSaveImageStartVM, qemudDomainRestore) (qemudDomainObjRestore): Rename... (qemuDomainSaveImageStartVM, qemuDomainRestore) (qemDomainObjRestore): ...and simplify accordingly. (qemudDomainObjStart, qemuDriver): Update callers. --- src/qemu/qemu_driver.c | 265 ++++++++--------------------------------------- 1 files changed, 45 insertions(+), 220 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 11ba910..9de19ea 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -3012,183 +3012,32 @@ cleanup: return ret; } -/* qemudOpenAsUID() - pipe/fork/setuid/open a file, and return the - pipe fd to caller, so that it can read from the file. Also return - the pid of the child process, so the caller can wait for it to exit - after it's finished reading (to avoid a zombie, if nothing - else). */ - -static int -qemudOpenAsUID(const char *path, uid_t uid, gid_t gid, pid_t *child_pid) -{ - int pipefd[2]; - int fd = -1; - - *child_pid = -1; - - if (pipe(pipefd) < 0) { - virReportSystemError(errno, - _("failed to create pipe to read '%s'"), - path); - pipefd[0] = pipefd[1] = -1; - goto parent_cleanup; - } - - int forkRet = virFork(child_pid); - - if (*child_pid < 0) { - virReportSystemError(errno, - _("failed to fork child to read '%s'"), - path); - goto parent_cleanup; - } - - if (*child_pid > 0) { - - /* parent */ - - /* parent doesn't need the write side of the pipe */ - VIR_FORCE_CLOSE(pipefd[1]); - - if (forkRet < 0) { - virReportSystemError(errno, - _("failed in parent after forking child to read '%s'"), - path); - goto parent_cleanup; - } - /* caller gets the read side of the pipe */ - fd = pipefd[0]; - pipefd[0] = -1; -parent_cleanup: - VIR_FORCE_CLOSE(pipefd[0]); - VIR_FORCE_CLOSE(pipefd[1]); - if ((fd < 0) && (*child_pid > 0)) { - /* a child process was started and subsequently an error - occurred in the parent, so we need to wait for it to - exit, but its status is inconsequential. */ - while ((waitpid(*child_pid, NULL, 0) == -1) - && (errno == EINTR)) { - /* empty */ - } - *child_pid = -1; - } - return fd; - } - - /* child */ - - /* setuid to the qemu user, then open the file, read it, - and stuff it into the pipe for the parent process to - read */ - int exit_code; - char *buf = NULL; - size_t bufsize = 1024 * 1024; - int bytesread; - - /* child doesn't need the read side of the pipe */ - VIR_FORCE_CLOSE(pipefd[0]); - - if (forkRet < 0) { - exit_code = errno; - virReportSystemError(errno, - _("failed in child after forking to read '%s'"), - path); - goto child_cleanup; - } - - if (virSetUIDGID(uid, gid) < 0) { - exit_code = errno; - goto child_cleanup; - } - - if ((fd = open(path, O_RDONLY)) < 0) { - exit_code = errno; - virReportSystemError(errno, - _("cannot open '%s' as uid %d"), - path, uid); - goto child_cleanup; - } - - if (VIR_ALLOC_N(buf, bufsize) < 0) { - exit_code = ENOMEM; - virReportOOMError(); - goto child_cleanup; - } - - /* read from fd and write to pipefd[1] until EOF */ - do { - if ((bytesread = saferead(fd, buf, bufsize)) < 0) { - exit_code = errno; - virReportSystemError(errno, - _("child failed reading from '%s'"), - path); - goto child_cleanup; - } - if (safewrite(pipefd[1], buf, bytesread) != bytesread) { - exit_code = errno; - virReportSystemError(errno, "%s", - _("child failed writing to pipe")); - goto child_cleanup; - } - } while (bytesread > 0); - exit_code = 0; - -child_cleanup: - VIR_FREE(buf); - VIR_FORCE_CLOSE(fd); - VIR_FORCE_CLOSE(pipefd[1]); - _exit(exit_code); -} - -static int qemudDomainSaveImageClose(int fd, pid_t read_pid, int *status) -{ - int ret = 0; - - if (VIR_CLOSE(fd) < 0) { - virReportSystemError(errno, "%s", - _("cannot close file")); - } - - if (read_pid != -1) { - /* reap the process that read the file */ - while ((ret = waitpid(read_pid, status, 0)) == -1 - && errno == EINTR) { - /* empty */ - } - } else if (status) { - *status = 0; - } - - return ret; -} - -static int ATTRIBUTE_NONNULL(3) ATTRIBUTE_NONNULL(4) ATTRIBUTE_NONNULL(5) -qemudDomainSaveImageOpen(struct qemud_driver *driver, - const char *path, - virDomainDefPtr *ret_def, - struct qemud_save_header *ret_header, - pid_t *ret_read_pid) +static int ATTRIBUTE_NONNULL(3) ATTRIBUTE_NONNULL(4) +qemuDomainSaveImageOpen(struct qemud_driver *driver, + const char *path, + virDomainDefPtr *ret_def, + struct qemud_save_header *ret_header) { int fd; - pid_t read_pid = -1; struct qemud_save_header header; char *xml = NULL; virDomainDefPtr def = NULL; - if ((fd = open(path, O_RDONLY)) < 0) { - if ((driver->user == 0) || (getuid() != 0)) { + if ((fd = virFileOpenAs(path, O_RDONLY, 0, getuid(), getgid(), 0)) < 0) { + if ((fd != -EACCES && fd != -EPERM) || + driver->user == getuid()) { qemuReportError(VIR_ERR_OPERATION_FAILED, "%s", _("cannot read domain image")); goto error; } /* Opening as root failed, but qemu runs as a different user - that might have better luck. Create a pipe, then fork a - child process to run as the qemu user, which will hopefully - have the necessary authority to read the file. */ - if ((fd = qemudOpenAsUID(path, - driver->user, driver->group, &read_pid)) < 0) { - /* error already reported */ + * that might have better luck. */ + if ((fd = virFileOpenAs(path, O_RDONLY, 0, + driver->user, driver->group, + VIR_FILE_OPEN_AS_UID)) < 0) { + qemuReportError(VIR_ERR_OPERATION_FAILED, + "%s", _("cannot read domain image")); goto error; } } @@ -3238,34 +3087,30 @@ qemudDomainSaveImageOpen(struct qemud_driver *driver, *ret_def = def; *ret_header = header; - *ret_read_pid = read_pid; return fd; error: virDomainDefFree(def); VIR_FREE(xml); - qemudDomainSaveImageClose(fd, read_pid, NULL); + VIR_FORCE_CLOSE(fd); return -1; } -static int ATTRIBUTE_NONNULL(6) -qemudDomainSaveImageStartVM(virConnectPtr conn, - struct qemud_driver *driver, - virDomainObjPtr vm, - int *fd, - pid_t *read_pid, - const struct qemud_save_header *header, - const char *path) +static int ATTRIBUTE_NONNULL(4) ATTRIBUTE_NONNULL(5) ATTRIBUTE_NONNULL(6) +qemuDomainSaveImageStartVM(virConnectPtr conn, + struct qemud_driver *driver, + virDomainObjPtr vm, + int *fd, + const struct qemud_save_header *header, + const char *path) { int ret = -1; virDomainEventPtr event; int intermediatefd = -1; pid_t intermediate_pid = -1; int childstat; - int wait_ret; - int status; if (header->version == 2) { const char *intermediate_argv[3] = { NULL, "-dc", NULL }; @@ -3298,8 +3143,9 @@ qemudDomainSaveImageStartVM(virConnectPtr conn, if (intermediate_pid != -1) { if (ret < 0) { - /* if there was an error setting up qemu, the intermediate process will - * wait forever to write to stdout, so we must manually kill it. + /* if there was an error setting up qemu, the intermediate + * process will wait forever to write to stdout, so we + * must manually kill it. */ VIR_FORCE_CLOSE(intermediatefd); VIR_FORCE_CLOSE(*fd); @@ -3314,30 +3160,10 @@ qemudDomainSaveImageStartVM(virConnectPtr conn, } VIR_FORCE_CLOSE(intermediatefd); - wait_ret = qemudDomainSaveImageClose(*fd, *read_pid, &status); - *fd = -1; - if (*read_pid != -1) { - if (wait_ret == -1) { - virReportSystemError(errno, - _("failed to wait for process reading '%s'"), - path); - ret = -1; - } else if (!WIFEXITED(status)) { - qemuReportError(VIR_ERR_OPERATION_FAILED, - _("child process exited abnormally reading '%s'"), - path); - ret = -1; - } else { - int exit_status = WEXITSTATUS(status); - if (exit_status != 0) { - virReportSystemError(exit_status, - _("child process returned error reading '%s'"), - path); - ret = -1; - } - } + if (VIR_CLOSE(*fd) < 0) { + virReportSystemError(errno, _("cannot close file: %s"), path); + ret = -1; } - *read_pid = -1; if (ret < 0) { qemuAuditDomainStart(vm, "restored", false); @@ -3376,19 +3202,20 @@ out: return ret; } -static int qemudDomainRestore(virConnectPtr conn, - const char *path) { +static int +qemuDomainRestore(virConnectPtr conn, + const char *path) +{ struct qemud_driver *driver = conn->privateData; virDomainDefPtr def = NULL; virDomainObjPtr vm = NULL; int fd = -1; - pid_t read_pid = -1; int ret = -1; struct qemud_save_header header; qemuDriverLock(driver); - fd = qemudDomainSaveImageOpen(driver, path, &def, &header, &read_pid); + fd = qemuDomainSaveImageOpen(driver, path, &def, &header); if (fd < 0) goto cleanup; @@ -3406,8 +3233,7 @@ static int qemudDomainRestore(virConnectPtr conn, if (qemuDomainObjBeginJobWithDriver(driver, vm) < 0) goto cleanup; - ret = qemudDomainSaveImageStartVM(conn, driver, vm, &fd, - &read_pid, &header, path); + ret = qemuDomainSaveImageStartVM(conn, driver, vm, &fd, &header, path); if (qemuDomainObjEndJob(vm) == 0) vm = NULL; @@ -3418,25 +3244,25 @@ static int qemudDomainRestore(virConnectPtr conn, cleanup: virDomainDefFree(def); - qemudDomainSaveImageClose(fd, read_pid, NULL); + VIR_FORCE_CLOSE(fd); if (vm) virDomainObjUnlock(vm); qemuDriverUnlock(driver); return ret; } -static int qemudDomainObjRestore(virConnectPtr conn, - struct qemud_driver *driver, - virDomainObjPtr vm, - const char *path) +static int +qemuDomainObjRestore(virConnectPtr conn, + struct qemud_driver *driver, + virDomainObjPtr vm, + const char *path) { virDomainDefPtr def = NULL; int fd = -1; - pid_t read_pid = -1; int ret = -1; struct qemud_save_header header; - fd = qemudDomainSaveImageOpen(driver, path, &def, &header, &read_pid); + fd = qemuDomainSaveImageOpen(driver, path, &def, &header); if (fd < 0) goto cleanup; @@ -3457,12 +3283,11 @@ static int qemudDomainObjRestore(virConnectPtr conn, virDomainObjAssignDef(vm, def, true); def = NULL; - ret = qemudDomainSaveImageStartVM(conn, driver, vm, &fd, - &read_pid, &header, path); + ret = qemuDomainSaveImageStartVM(conn, driver, vm, &fd, &header, path); cleanup: virDomainDefFree(def); - qemudDomainSaveImageClose(fd, read_pid, NULL); + VIR_FORCE_CLOSE(fd); return ret; } @@ -3673,7 +3498,7 @@ static int qemudDomainObjStart(virConnectPtr conn, */ managed_save = qemuDomainManagedSavePath(driver, vm); if ((managed_save) && (virFileExists(managed_save))) { - ret = qemudDomainObjRestore(conn, driver, vm, managed_save); + ret = qemuDomainObjRestore(conn, driver, vm, managed_save); if (unlink(managed_save) < 0) { VIR_WARN("Failed to remove the managed state %s", managed_save); @@ -6810,7 +6635,7 @@ static virDriver qemuDriver = { qemudDomainSetMemory, /* domainSetMemory */ qemudDomainGetInfo, /* domainGetInfo */ qemudDomainSave, /* domainSave */ - qemudDomainRestore, /* domainRestore */ + qemuDomainRestore, /* domainRestore */ qemudDomainCoreDump, /* domainCoreDump */ qemudDomainSetVcpus, /* domainSetVcpus */ qemudDomainSetVcpusFlags, /* domainSetVcpusFlags */ -- 1.7.4

On Wed, Mar 09, 2011 at 07:18:30PM -0700, Eric Blake wrote:
Direct access to an open file is so much simpler than passing everything through a pipe!
Did you test with SELinux in enforcing mode - there's a very slight chance this could impact things, though I think we do label all FDs correctly these days.
* src/qemu/qemu_driver.c (qemudOpenAsUID) (qemudDomainSaveImageClose): Delete. (qemudDomainSaveImageOpen): Rename... (qemuDomainSaveImageOpen): ...and drop read_pid argument. Use virFileOpenAs instead of qemudOpenAsUID. (qemudDomainSaveImageStartVM, qemudDomainRestore) (qemudDomainObjRestore): Rename... (qemuDomainSaveImageStartVM, qemuDomainRestore) (qemDomainObjRestore): ...and simplify accordingly. (qemudDomainObjStart, qemuDriver): Update callers. --- src/qemu/qemu_driver.c | 265 ++++++++--------------------------------------- 1 files changed, 45 insertions(+), 220 deletions(-)
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 03/10/2011 05:14 AM, Daniel P. Berrange wrote:
On Wed, Mar 09, 2011 at 07:18:30PM -0700, Eric Blake wrote:
Direct access to an open file is so much simpler than passing everything through a pipe!
Did you test with SELinux in enforcing mode - there's a very slight chance this could impact things, though I think we do label all FDs correctly these days.
Yes, I did, and things still worked to a local disk (where I'm assuming SELinux enforcing would have stopped me). In fact, when passing the fd via SCM_RIGHTS, things still worked even without relabeling the file (although I have no idea if that's a hole in SELinux enforcing). I'm still in the middle of testing saving to a root-squash NFS location, then I will repost this series for one last review.
* src/qemu/qemu_driver.c (qemudOpenAsUID) (qemudDomainSaveImageClose): Delete. (qemudDomainSaveImageOpen): Rename... (qemuDomainSaveImageOpen): ...and drop read_pid argument. Use virFileOpenAs instead of qemudOpenAsUID.
-- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On 03/25/2011 03:56 PM, Eric Blake wrote:
On 03/10/2011 05:14 AM, Daniel P. Berrange wrote:
On Wed, Mar 09, 2011 at 07:18:30PM -0700, Eric Blake wrote:
Direct access to an open file is so much simpler than passing everything through a pipe!
Did you test with SELinux in enforcing mode - there's a very slight chance this could impact things, though I think we do label all FDs correctly these days.
Yes, I did, and things still worked to a local disk (where I'm assuming SELinux enforcing would have stopped me). In fact, when passing the fd via SCM_RIGHTS, things still worked even without relabeling the file (although I have no idea if that's a hole in SELinux enforcing).
Then again, my testing was done via running just-built daemon/libvirtd, which probably means I was in an unconfined context, where transitions just work. I guess that means I need to retest with libvirtd running as a service under its normal SELinux settings, which might make a difference. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

This points out that core dumps (still) don't work for root-squash NFS, since the fd is not opened correctly. This patch should not introduce any functionality change, it is just a refactoring to avoid duplicated code. * src/qemu/qemu_driver.c (qemuDomainMigrateToFile): New function. (qemudDomainSaveFlag, doCoreDump): Use it. --- src/qemu/qemu_driver.c | 249 +++++++++++++++++++++--------------------------- 1 files changed, 110 insertions(+), 139 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 9de19ea..2422482 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -1766,6 +1766,101 @@ endjob: return ret; } +/* Internal function called while driver lock is held and vm is active. */ +static int +qemuDomainMigrateToFile(struct qemud_driver *driver, virDomainObjPtr vm, + virBitmapPtr qemuCaps, + int fd, off_t offset, const char *path, + int compressed, + bool is_reg, bool bypassSecurityDriver) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + virCgroupPtr cgroup = NULL; + int ret = -1; + int rc; + + if (!is_reg && + qemuCgroupControllerActive(driver, VIR_CGROUP_CONTROLLER_DEVICES)) { + if (virCgroupForDomain(driver->cgroup, vm->def->name, + &cgroup, 0) != 0) { + qemuReportError(VIR_ERR_INTERNAL_ERROR, + _("Unable to find cgroup for %s"), + vm->def->name); + goto cleanup; + } + rc = virCgroupAllowDevicePath(cgroup, path, + VIR_CGROUP_DEVICE_RW); + qemuAuditCgroupPath(vm, cgroup, "allow", path, "rw", rc); + if (rc < 0) { + virReportSystemError(-rc, + _("Unable to allow device %s for %s"), + path, vm->def->name); + goto cleanup; + } + } + + if ((!bypassSecurityDriver) && + virSecurityManagerSetSavedStateLabel(driver->securityManager, + vm, path) < 0) + goto cleanup; + + if (compressed == QEMUD_SAVE_FORMAT_RAW) { + const char *args[] = { "cat", NULL }; + + qemuDomainObjEnterMonitorWithDriver(driver, vm); + if (qemuCaps && qemuCapsGet(qemuCaps, QEMU_CAPS_MIGRATE_QEMU_FD) && + priv->monConfig->type == VIR_DOMAIN_CHR_TYPE_UNIX) { + rc = qemuMonitorMigrateToFd(priv->mon, + QEMU_MONITOR_MIGRATE_BACKGROUND, + fd); + } else { + rc = qemuMonitorMigrateToFile(priv->mon, + QEMU_MONITOR_MIGRATE_BACKGROUND, + args, path, offset); + } + qemuDomainObjExitMonitorWithDriver(driver, vm); + } else { + const char *prog = qemudSaveCompressionTypeToString(compressed); + const char *args[] = { + prog, + "-c", + NULL + }; + qemuDomainObjEnterMonitorWithDriver(driver, vm); + rc = qemuMonitorMigrateToFile(priv->mon, + QEMU_MONITOR_MIGRATE_BACKGROUND, + args, path, offset); + qemuDomainObjExitMonitorWithDriver(driver, vm); + } + + if (rc < 0) + goto cleanup; + + rc = qemuMigrationWaitForCompletion(driver, vm); + + if (rc < 0) + goto cleanup; + + ret = 0; + +cleanup: + if ((!bypassSecurityDriver) && + virSecurityManagerRestoreSavedStateLabel(driver->securityManager, + vm, path) < 0) + VIR_WARN("failed to restore save state label on %s", path); + + if (cgroup != NULL) { + rc = virCgroupDenyDevicePath(cgroup, path, + VIR_CGROUP_DEVICE_RWM); + qemuAuditCgroupPath(vm, cgroup, "deny", path, "rwm", rc); + if (rc < 0) + VIR_WARN("Unable to deny device %s for %s %d", + path, vm->def->name, rc); + virCgroupFree(&cgroup); + } + return ret; +} + /* This internal function expects the driver lock to already be held on * entry and the vm must be active. */ @@ -1775,15 +1870,14 @@ static int qemudDomainSaveFlag(struct qemud_driver *driver, virDomainPtr dom, { char *xml = NULL; struct qemud_save_header header; - int bypassSecurityDriver = 0; + bool bypassSecurityDriver = false; int ret = -1; int rc; virDomainEventPtr event = NULL; qemuDomainObjPrivatePtr priv; struct stat sb; - int is_reg = 0; + bool is_reg = false; unsigned long long offset; - virCgroupPtr cgroup = NULL; virBitmapPtr qemuCaps = NULL; int fd = -1; @@ -1838,9 +1932,9 @@ static int qemudDomainSaveFlag(struct qemud_driver *driver, virDomainPtr dom, * that with NFS we can't actually stat() the file. * The subsequent codepaths will still raise an error * if a truely fatal problem is hit */ - is_reg = 1; + is_reg = true; } else { - is_reg = S_ISREG(sb.st_mode); + is_reg = !!S_ISREG(sb.st_mode); } offset = sizeof(header) + header.xml_len; @@ -1935,7 +2029,7 @@ static int qemudDomainSaveFlag(struct qemud_driver *driver, virDomainPtr dom, is NFS, we assume it's a root-squashing NFS share, and that the security driver stuff would have failed anyway */ - bypassSecurityDriver = 1; + bypassSecurityDriver = true; } } @@ -1945,86 +2039,13 @@ static int qemudDomainSaveFlag(struct qemud_driver *driver, virDomainPtr dom, goto endjob; } - /* Allow qemu to access file */ - - if (!is_reg && - qemuCgroupControllerActive(driver, VIR_CGROUP_CONTROLLER_DEVICES)) { - if (virCgroupForDomain(driver->cgroup, vm->def->name, &cgroup, 0) != 0) { - qemuReportError(VIR_ERR_INTERNAL_ERROR, - _("Unable to find cgroup for %s"), - vm->def->name); - goto endjob; - } - rc = virCgroupAllowDevicePath(cgroup, path, - VIR_CGROUP_DEVICE_RW); - qemuAuditCgroupPath(vm, cgroup, "allow", path, "rw", rc); - if (rc < 0) { - virReportSystemError(-rc, - _("Unable to allow device %s for %s"), - path, vm->def->name); - goto endjob; - } - } - - if ((!bypassSecurityDriver) && - virSecurityManagerSetSavedStateLabel(driver->securityManager, - vm, path) < 0) - goto endjob; - - if (header.compressed == QEMUD_SAVE_FORMAT_RAW) { - const char *args[] = { "cat", NULL }; - - qemuDomainObjEnterMonitorWithDriver(driver, vm); - if (qemuCapsGet(qemuCaps, QEMU_CAPS_MIGRATE_QEMU_FD) && - priv->monConfig->type == VIR_DOMAIN_CHR_TYPE_UNIX) { - rc = qemuMonitorMigrateToFd(priv->mon, - QEMU_MONITOR_MIGRATE_BACKGROUND, - fd); - } else { - rc = qemuMonitorMigrateToFile(priv->mon, - QEMU_MONITOR_MIGRATE_BACKGROUND, - args, path, offset); - } - qemuDomainObjExitMonitorWithDriver(driver, vm); - } else { - const char *prog = qemudSaveCompressionTypeToString(header.compressed); - const char *args[] = { - prog, - "-c", - NULL - }; - qemuDomainObjEnterMonitorWithDriver(driver, vm); - rc = qemuMonitorMigrateToFile(priv->mon, - QEMU_MONITOR_MIGRATE_BACKGROUND, - args, path, offset); - qemuDomainObjExitMonitorWithDriver(driver, vm); - } - - if (rc < 0) - goto endjob; - + /* Perform the migration */ + if (qemuDomainMigrateToFile(driver, vm, qemuCaps, fd, offset, path, + compressed, is_reg, bypassSecurityDriver) < 0) + goto cleanup; if (VIR_CLOSE(fd) < 0) { virReportSystemError(errno, _("unable to close %s"), path); - goto endjob; - } - - rc = qemuMigrationWaitForCompletion(driver, vm); - - if (rc < 0) - goto endjob; - - if ((!bypassSecurityDriver) && - virSecurityManagerRestoreSavedStateLabel(driver->securityManager, - vm, path) < 0) - VIR_WARN("failed to restore save state label on %s", path); - - if (cgroup != NULL) { - rc = virCgroupDenyDevicePath(cgroup, path, - VIR_CGROUP_DEVICE_RWM); - qemuAuditCgroupPath(vm, cgroup, "deny", path, "rwm", rc); - if (rc < 0) - VIR_WARN("Unable to deny device %s for %s %d", - path, vm->def->name, rc); + goto cleanup; } ret = 0; @@ -2050,22 +2071,7 @@ endjob: if (rc < 0) VIR_WARN0("Unable to resume guest CPUs after save failure"); } - - if (cgroup != NULL) { - rc = virCgroupDenyDevicePath(cgroup, path, - VIR_CGROUP_DEVICE_RWM); - qemuAuditCgroupPath(vm, cgroup, "deny", path, "rwm", rc); - if (rc < 0) - VIR_WARN("Unable to deny device %s for %s: %d", - path, vm->def->name, rc); - } - - if ((!bypassSecurityDriver) && - virSecurityManagerRestoreSavedStateLabel(driver->securityManager, - vm, path) < 0) - VIR_WARN("failed to restore save state label on %s", path); } - if (qemuDomainObjEndJob(vm) == 0) vm = NULL; } @@ -2078,7 +2084,6 @@ cleanup: unlink(path); if (event) qemuDomainEventQueue(driver, event); - virCgroupFree(&cgroup); return ret; } @@ -2282,9 +2287,6 @@ static int doCoreDump(struct qemud_driver *driver, { int fd = -1; int ret = -1; - qemuDomainObjPrivatePtr priv; - - priv = vm->privateData; /* Create an empty file with appropriate ownership. */ if ((fd = open(path, O_CREAT|O_TRUNC|O_WRONLY, S_IRUSR|S_IWUSR)) < 0) { @@ -2293,6 +2295,10 @@ static int doCoreDump(struct qemud_driver *driver, goto cleanup; } + if (qemuDomainMigrateToFile(driver, vm, NULL, fd, 0, path, + compress, true, false) < 0) + goto cleanup; + if (VIR_CLOSE(fd) < 0) { virReportSystemError(errno, _("unable to save file %s"), @@ -2300,42 +2306,7 @@ static int doCoreDump(struct qemud_driver *driver, goto cleanup; } - if (virSecurityManagerSetSavedStateLabel(driver->securityManager, - vm, path) < 0) - goto cleanup; - - qemuDomainObjEnterMonitorWithDriver(driver, vm); - if (compress == QEMUD_SAVE_FORMAT_RAW) { - const char *args[] = { - "cat", - NULL, - }; - ret = qemuMonitorMigrateToFile(priv->mon, - QEMU_MONITOR_MIGRATE_BACKGROUND, - args, path, 0); - } else { - const char *prog = qemudSaveCompressionTypeToString(compress); - const char *args[] = { - prog, - "-c", - NULL, - }; - ret = qemuMonitorMigrateToFile(priv->mon, - QEMU_MONITOR_MIGRATE_BACKGROUND, - args, path, 0); - } - qemuDomainObjExitMonitorWithDriver(driver, vm); - if (ret < 0) - goto cleanup; - - ret = qemuMigrationWaitForCompletion(driver, vm); - - if (ret < 0) - goto cleanup; - - if (virSecurityManagerRestoreSavedStateLabel(driver->securityManager, - vm, path) < 0) - goto cleanup; + ret = 0; cleanup: if (ret != 0) -- 1.7.4

On Wed, Mar 09, 2011 at 07:18:31PM -0700, Eric Blake wrote:
This points out that core dumps (still) don't work for root-squash NFS, since the fd is not opened correctly. This patch should not introduce any functionality change, it is just a refactoring to avoid duplicated code.
* src/qemu/qemu_driver.c (qemuDomainMigrateToFile): New function. (qemudDomainSaveFlag, doCoreDump): Use it. --- src/qemu/qemu_driver.c | 249 +++++++++++++++++++++--------------------------- 1 files changed, 110 insertions(+), 139 deletions(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 9de19ea..2422482 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -1766,6 +1766,101 @@ endjob: return ret; }
+/* Internal function called while driver lock is held and vm is active. */ +static int +qemuDomainMigrateToFile(struct qemud_driver *driver, virDomainObjPtr vm, + virBitmapPtr qemuCaps, + int fd, off_t offset, const char *path, + int compressed, + bool is_reg, bool bypassSecurityDriver) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + virCgroupPtr cgroup = NULL; + int ret = -1; + int rc; + + if (!is_reg && + qemuCgroupControllerActive(driver, VIR_CGROUP_CONTROLLER_DEVICES)) { + if (virCgroupForDomain(driver->cgroup, vm->def->name, + &cgroup, 0) != 0) { + qemuReportError(VIR_ERR_INTERNAL_ERROR, + _("Unable to find cgroup for %s"), + vm->def->name); + goto cleanup; + } + rc = virCgroupAllowDevicePath(cgroup, path, + VIR_CGROUP_DEVICE_RW); + qemuAuditCgroupPath(vm, cgroup, "allow", path, "rw", rc); + if (rc < 0) { + virReportSystemError(-rc, + _("Unable to allow device %s for %s"), + path, vm->def->name); + goto cleanup; + } + } + + if ((!bypassSecurityDriver) && + virSecurityManagerSetSavedStateLabel(driver->securityManager, + vm, path) < 0) + goto cleanup; + + if (compressed == QEMUD_SAVE_FORMAT_RAW) { + const char *args[] = { "cat", NULL }; + + qemuDomainObjEnterMonitorWithDriver(driver, vm); + if (qemuCaps && qemuCapsGet(qemuCaps, QEMU_CAPS_MIGRATE_QEMU_FD) && + priv->monConfig->type == VIR_DOMAIN_CHR_TYPE_UNIX) { + rc = qemuMonitorMigrateToFd(priv->mon, + QEMU_MONITOR_MIGRATE_BACKGROUND, + fd); + } else { + rc = qemuMonitorMigrateToFile(priv->mon, + QEMU_MONITOR_MIGRATE_BACKGROUND, + args, path, offset); + } + qemuDomainObjExitMonitorWithDriver(driver, vm); + } else { + const char *prog = qemudSaveCompressionTypeToString(compressed); + const char *args[] = { + prog, + "-c", + NULL + }; + qemuDomainObjEnterMonitorWithDriver(driver, vm); + rc = qemuMonitorMigrateToFile(priv->mon, + QEMU_MONITOR_MIGRATE_BACKGROUND, + args, path, offset); + qemuDomainObjExitMonitorWithDriver(driver, vm); + } + + if (rc < 0) + goto cleanup; + + rc = qemuMigrationWaitForCompletion(driver, vm); + + if (rc < 0) + goto cleanup; + + ret = 0; + +cleanup: + if ((!bypassSecurityDriver) && + virSecurityManagerRestoreSavedStateLabel(driver->securityManager, + vm, path) < 0) + VIR_WARN("failed to restore save state label on %s", path); + + if (cgroup != NULL) { + rc = virCgroupDenyDevicePath(cgroup, path, + VIR_CGROUP_DEVICE_RWM); + qemuAuditCgroupPath(vm, cgroup, "deny", path, "rwm", rc); + if (rc < 0) + VIR_WARN("Unable to deny device %s for %s %d", + path, vm->def->name, rc); + virCgroupFree(&cgroup); + } + return ret; +}
I think it could be worth moving this method into qemu_migrate.h/.c now it is decoupled from the public API objects/methods. ACK to the refactoring regardless. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 03/10/2011 05:16 AM, Daniel P. Berrange wrote:
On Wed, Mar 09, 2011 at 07:18:31PM -0700, Eric Blake wrote:
This points out that core dumps (still) don't work for root-squash NFS, since the fd is not opened correctly. This patch should not introduce any functionality change, it is just a refactoring to avoid duplicated code.
* src/qemu/qemu_driver.c (qemuDomainMigrateToFile): New function. (qemudDomainSaveFlag, doCoreDump): Use it.
I think it could be worth moving this method into qemu_migrate.h/.c now it is decoupled from the public API objects/methods.
Agreed; and will do that for v5 coming up once my NFS testing completes.
ACK to the refactoring regardless.
-- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

SELinux labeling and cgroup ACLs aren't required if we hand a pre-opened fd to qemu. All the more reason to love fd: migration. * src/qemu/qemu_driver.c (qemuDomainMigrateToFile): Skip steps that are irrelevant in fd migration. --- src/qemu/qemu_driver.c | 65 +++++++++++++++++++++++++++++------------------ 1 files changed, 40 insertions(+), 25 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 2422482..a679b2c 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -1779,35 +1779,52 @@ qemuDomainMigrateToFile(struct qemud_driver *driver, virDomainObjPtr vm, int ret = -1; int rc; - if (!is_reg && - qemuCgroupControllerActive(driver, VIR_CGROUP_CONTROLLER_DEVICES)) { - if (virCgroupForDomain(driver->cgroup, vm->def->name, - &cgroup, 0) != 0) { - qemuReportError(VIR_ERR_INTERNAL_ERROR, - _("Unable to find cgroup for %s"), - vm->def->name); - goto cleanup; + if (qemuCaps && qemuCapsGet(qemuCaps, QEMU_CAPS_MIGRATE_QEMU_FD) && + priv->monConfig->type == VIR_DOMAIN_CHR_TYPE_UNIX && + compressed == QEMUD_SAVE_FORMAT_RAW) { + /* All right! We can use fd migration, which means that qemu + * doesn't have to open() the file, so we don't have to futz + * around with granting access or revoking it later. */ + is_reg = true; + bypassSecurityDriver = true; + } else { + /* Phooey - we have to fall back on exec migration, where qemu + * has to popen() the file by name. We might also stumble on + * the qemu race where it does a wait() that botches pclose(). + * FIXME: can we reliably detect and ignore that particular + * failure where the migration completes but the reported + * status is wrong? */ + if (!is_reg && + qemuCgroupControllerActive(driver, + VIR_CGROUP_CONTROLLER_DEVICES)) { + if (virCgroupForDomain(driver->cgroup, vm->def->name, + &cgroup, 0) != 0) { + qemuReportError(VIR_ERR_INTERNAL_ERROR, + _("Unable to find cgroup for %s"), + vm->def->name); + goto cleanup; + } + rc = virCgroupAllowDevicePath(cgroup, path, + VIR_CGROUP_DEVICE_RW); + qemuAuditCgroupPath(vm, cgroup, "allow", path, "rw", rc); + if (rc < 0) { + virReportSystemError(-rc, + _("Unable to allow device %s for %s"), + path, vm->def->name); + goto cleanup; + } } - rc = virCgroupAllowDevicePath(cgroup, path, - VIR_CGROUP_DEVICE_RW); - qemuAuditCgroupPath(vm, cgroup, "allow", path, "rw", rc); - if (rc < 0) { - virReportSystemError(-rc, - _("Unable to allow device %s for %s"), - path, vm->def->name); + + if ((!bypassSecurityDriver) && + virSecurityManagerSetSavedStateLabel(driver->securityManager, + vm, path) < 0) goto cleanup; - } } - if ((!bypassSecurityDriver) && - virSecurityManagerSetSavedStateLabel(driver->securityManager, - vm, path) < 0) - goto cleanup; - + qemuDomainObjEnterMonitorWithDriver(driver, vm); if (compressed == QEMUD_SAVE_FORMAT_RAW) { const char *args[] = { "cat", NULL }; - qemuDomainObjEnterMonitorWithDriver(driver, vm); if (qemuCaps && qemuCapsGet(qemuCaps, QEMU_CAPS_MIGRATE_QEMU_FD) && priv->monConfig->type == VIR_DOMAIN_CHR_TYPE_UNIX) { rc = qemuMonitorMigrateToFd(priv->mon, @@ -1818,7 +1835,6 @@ qemuDomainMigrateToFile(struct qemud_driver *driver, virDomainObjPtr vm, QEMU_MONITOR_MIGRATE_BACKGROUND, args, path, offset); } - qemuDomainObjExitMonitorWithDriver(driver, vm); } else { const char *prog = qemudSaveCompressionTypeToString(compressed); const char *args[] = { @@ -1826,12 +1842,11 @@ qemuDomainMigrateToFile(struct qemud_driver *driver, virDomainObjPtr vm, "-c", NULL }; - qemuDomainObjEnterMonitorWithDriver(driver, vm); rc = qemuMonitorMigrateToFile(priv->mon, QEMU_MONITOR_MIGRATE_BACKGROUND, args, path, offset); - qemuDomainObjExitMonitorWithDriver(driver, vm); } + qemuDomainObjExitMonitorWithDriver(driver, vm); if (rc < 0) goto cleanup; -- 1.7.4

On Wed, Mar 09, 2011 at 07:18:32PM -0700, Eric Blake wrote:
SELinux labeling and cgroup ACLs aren't required if we hand a pre-opened fd to qemu. All the more reason to love fd: migration.
I know that holds true for cgroups which checks on open() only, but are you absolutely sure about for SELinux? SELinux checks FDs on every single syscall. I'm really fuzzy about what happens to an FD's associated security context when you pass it over an UNIX socket using SCM_RIGHTS. I think it might 'just work' as we already do this with TAP devices and don't label them, but it could be we have a generic policy rule related to TAP devices. If it passed testing with SELinux in enforcing mode, then ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 03/10/2011 05:19 AM, Daniel P. Berrange wrote:
On Wed, Mar 09, 2011 at 07:18:32PM -0700, Eric Blake wrote:
SELinux labeling and cgroup ACLs aren't required if we hand a pre-opened fd to qemu. All the more reason to love fd: migration.
I know that holds true for cgroups which checks on open() only, but are you absolutely sure about for SELinux? SELinux checks FDs on every single syscall. I'm really fuzzy about what happens to an FD's associated security context when you pass it over an UNIX socket using SCM_RIGHTS. I think it might 'just work' as we already do this with TAP devices and don't label them, but it could be we have a generic policy rule related to TAP devices.
Whether surprising or not, it worked without doing any labeling on the fd. I don't know if that's a hole in SELinux. In fact, it's more than just the SELinux labeling - it's also the DAC labeling (that is, pre-patch, the file is owned by qemu:qemu during the migration, then chown'd back to root:root on completion; post-patch, the file is never chown'd in the first place, and it is the SCM_RIGHTS of the open fd that lets a non-root process write into a 600 root:root file). But that's not a hole (it's always been possible in Unix to do things on open fds where you can't do the same by attempting open() yourself on the same underlying file - for example, open(file,O_CREAT|O_RDWR,0000) lets you read and write into a temporary fd that not even another process with the same uid can reopen. Or put another way, an open fd is stateful - it remembers the permissions at the time of the open, and not is not impacted by any intervening chmod or chown of the underyling file).
If it passed testing with SELinux in enforcing mode, then ACK
It does indeed pass (to my relief). -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

Spawn the compressor ourselves, instead of requiring the shell. * src/qemu/qemu_driver.c (qemuDomainMigrateToFile): Spawn compression helper process when needed. --- src/qemu/qemu_driver.c | 37 ++++++++++++++++++++++++++++++++----- 1 files changed, 32 insertions(+), 5 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index a679b2c..02c3c48 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -1778,10 +1778,12 @@ qemuDomainMigrateToFile(struct qemud_driver *driver, virDomainObjPtr vm, virCgroupPtr cgroup = NULL; int ret = -1; int rc; + virCommandPtr cmd = NULL; + int pipeFD[2] = { -1, -1 }; if (qemuCaps && qemuCapsGet(qemuCaps, QEMU_CAPS_MIGRATE_QEMU_FD) && priv->monConfig->type == VIR_DOMAIN_CHR_TYPE_UNIX && - compressed == QEMUD_SAVE_FORMAT_RAW) { + (compressed == QEMUD_SAVE_FORMAT_RAW || pipe(pipeFD) == 0)) { /* All right! We can use fd migration, which means that qemu * doesn't have to open() the file, so we don't have to futz * around with granting access or revoking it later. */ @@ -1842,9 +1844,26 @@ qemuDomainMigrateToFile(struct qemud_driver *driver, virDomainObjPtr vm, "-c", NULL }; - rc = qemuMonitorMigrateToFile(priv->mon, - QEMU_MONITOR_MIGRATE_BACKGROUND, - args, path, offset); + if (qemuCaps && qemuCapsGet(qemuCaps, QEMU_CAPS_MIGRATE_QEMU_FD) && + priv->monConfig->type == VIR_DOMAIN_CHR_TYPE_UNIX) { + cmd = virCommandNewArgs(args); + virCommandSetInputFD(cmd, pipeFD[0]); + virCommandSetOutputFD(cmd, &fd); + if (virCommandRunAsync(cmd, NULL) < 0) { + qemuDomainObjExitMonitorWithDriver(driver, vm); + goto cleanup; + } + rc = qemuMonitorMigrateToFd(priv->mon, + QEMU_MONITOR_MIGRATE_BACKGROUND, + pipeFD[1]); + if (VIR_CLOSE(pipeFD[0]) < 0 || + VIR_CLOSE(pipeFD[1]) < 0) + VIR_WARN0("failed to close intermediate pipe"); + } else { + rc = qemuMonitorMigrateToFile(priv->mon, + QEMU_MONITOR_MIGRATE_BACKGROUND, + args, path, offset); + } } qemuDomainObjExitMonitorWithDriver(driver, vm); @@ -1852,13 +1871,21 @@ qemuDomainMigrateToFile(struct qemud_driver *driver, virDomainObjPtr vm, goto cleanup; rc = qemuMigrationWaitForCompletion(driver, vm); - if (rc < 0) goto cleanup; + if (cmd && virCommandWait(cmd, NULL) < 0) + goto cleanup; + ret = 0; cleanup: + VIR_FORCE_CLOSE(pipeFD[0]); + VIR_FORCE_CLOSE(pipeFD[1]); + /* FIXME - virCommandWait can overwrite errors; need to add + * virCommandKill that does the job silently */ + ignore_value(virCommandWait(cmd, NULL)); + virCommandFree(cmd); if ((!bypassSecurityDriver) && virSecurityManagerRestoreSavedStateLabel(driver->securityManager, vm, path) < 0) -- 1.7.4

On 03/10/2011 03:18 AM, Eric Blake wrote:
+ if (qemuCaps&& qemuCapsGet(qemuCaps, QEMU_CAPS_MIGRATE_QEMU_FD)&& + priv->monConfig->type == VIR_DOMAIN_CHR_TYPE_UNIX) { + cmd = virCommandNewArgs(args); + virCommandSetInputFD(cmd, pipeFD[0]); + virCommandSetOutputFD(cmd,&fd);
Should pipeFD[1] get cloexec treatment?
+ if (virCommandRunAsync(cmd, NULL)< 0) { + qemuDomainObjExitMonitorWithDriver(driver, vm); + goto cleanup; + } + rc = qemuMonitorMigrateToFd(priv->mon, + QEMU_MONITOR_MIGRATE_BACKGROUND, + pipeFD[1]);
Paolo

On Wed, Mar 09, 2011 at 07:18:33PM -0700, Eric Blake wrote:
Spawn the compressor ourselves, instead of requiring the shell.
* src/qemu/qemu_driver.c (qemuDomainMigrateToFile): Spawn compression helper process when needed. --- src/qemu/qemu_driver.c | 37 ++++++++++++++++++++++++++++++++----- 1 files changed, 32 insertions(+), 5 deletions(-)
ACK
cleanup: + VIR_FORCE_CLOSE(pipeFD[0]); + VIR_FORCE_CLOSE(pipeFD[1]); + /* FIXME - virCommandWait can overwrite errors; need to add + * virCommandKill that does the job silently */ + ignore_value(virCommandWait(cmd, NULL)); + virCommandFree(cmd);
You can do virErrorPtr orig_err; ... cleanup: orig_err = virSaveLastError(); ...code which may overwrite errors... if (orig_err) { virSetError(orig_err); virFreeError(orig_err); } REgards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 03/09/2011 07:18 PM, Eric Blake wrote:
Spawn the compressor ourselves, instead of requiring the shell.
* src/qemu/qemu_driver.c (qemuDomainMigrateToFile): Spawn compression helper process when needed. --- src/qemu/qemu_driver.c | 37 ++++++++++++++++++++++++++++++++----- 1 files changed, 32 insertions(+), 5 deletions(-)
Phooey. Testing this reveals that we have a bigger problem. If save_image_format in qemu.conf is not raw, then we use -incoming fd:22 for 'virsh restore dom.sav'; however, we tie it to a pipe fd created by virExec. And virExec creates the libvirt side of the pipe as cloexec. So the side of the pipe that we hand to qemu is closed on exec, and the restore side causes qemu to fail with: Migration failed. Exit code fd:23(-9), exiting. That has to be fixed before I can test whether this patch works, since it takes a save/restore cycle on a compressed image to feel 100% confident (although I think that save side is correct, given that everything prior to the qemu getting EBADF worked correctly as I was stepping through the debugger). -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On Wed, Mar 09, 2011 at 07:18:29PM -0700, Eric Blake wrote:
* src/storage/storage_backend.c (createRawFileOpHook): Change signature. (struct createRawFileOpHookData): Delete unused struct. (virStorageBackendCreateRaw): Adjust caller. * src/qemu/qemu_driver.c (struct fileOpHookData): Delete unused struct. (qemudDomainSaveFileOpHook): Rename... (qemuDomainSaveFileOpHook): ...and change signature. (qemudDomainSaveFlag): Adjust caller. ---
Phooey - Just noticed that my ISP griped that git send-email sent too many mails in one session (is there a setting to tweak to force multiple sessions?) Hopefully I rethreaded this correctly.
src/qemu/qemu_driver.c | 29 +++++++++-------------------- src/storage/storage_backend.c | 34 ++++++++++++++-------------------- 2 files changed, 23 insertions(+), 40 deletions(-)
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 03/09/2011 08:45 PM, Eric Blake wrote:
Needs more testing - especially on root-squash NFS
Starting up a domain with backing store on root-squash NFS works fine. When I try to save the domain to a directory on a root-squash NFS, though, the save hangs forever with a zombified child process with a WCHAN of "exit", and the parent process is sitting on recvmsg(): Thread 4 (Thread 0x7f6045f55700 (LWP 3754)): #0 0x000000332840e9cd in recvmsg () from /lib64/libpthread.so.0 #1 0x00007f604ea7ddc9 in virFileOpenAs ( path=0x7f6038009e90 "/remote/vhost-libvirt/qemu/RHEL6.save", openflags=577, mode=432, uid=107, gid=107, flags=1) at util/util.c:1513 #2 0x00000000004416b0 in qemudDomainSaveFlag (driver=0xbb96f0, dom=0x7f60380095a0, vm=0xd219c0, path=0x7f6038009e90 "/remote/vhost-libvirt/qemu/RHEL6.save", compressed=<value optimized out>) at qemu/qemu_driver.c:1924 #3 0x0000000000441a12 in qemudDomainSave (dom=0x7f60380095a0, path=0x7f6038009e90 "/remote/vhost-libvirt/qemu/RHEL6.save") at qemu/qemu_driver.c:2143 #4 0x00007f604eadeae6 in virDomainSave (domain=0x7f60380095a0, to=0x7f6038009e90 "/remote/vhost-libvirt/qemu/RHEL6.save") at libvirt.c:2280 I get a similar hang when I try to restore from an image saved on root-squash NFS. (if it makes any difference, this is on F13, with qemu-kvm-0.13.0-1) Do you want me to try bisecting it? Access to the machine?
Eric Blake (15): qemu: use lighter-weight fd:n on incoming tunneled migration qemu: consolidate duplicated monitor migration code qemu: improve efficiency of dd during snapshots qemu: support migration to fd util: use SCM_RIGHTS in virFileOperation when needed qemu: allow simple domain save to use fd: protocol qemu: simplify domain save fd handling storage: simplify fd handling util: rename virFileOperation to virFileOpenAs util: adjust indentation in previous patch qemu, storage: improve type safety qemu: use common API for reading difficult files qemu: consolidate migration to file code qemu: skip granting access during fd migration qemu: support fd: migration with compression
src/libvirt_private.syms | 2 +- src/qemu/qemu_driver.c | 626 ++++++++++++++++------------------------- src/qemu/qemu_migration.c | 45 +-- src/qemu/qemu_monitor.c | 124 ++++++++- src/qemu/qemu_monitor.h | 4 + src/qemu/qemu_monitor_json.c | 125 +-------- src/qemu/qemu_monitor_json.h | 23 +-- src/qemu/qemu_monitor_text.c | 123 +-------- src/qemu/qemu_monitor_text.h | 23 +-- src/storage/storage_backend.c | 80 +++--- src/util/util.c | 161 ++++++++---- src/util/util.h | 15 +- 12 files changed, 544 insertions(+), 807 deletions(-)

At 03/10/2011 01:24 PM, Laine Stump Write:
On 03/09/2011 08:45 PM, Eric Blake wrote:
Needs more testing - especially on root-squash NFS
Starting up a domain with backing store on root-squash NFS works fine.
When I try to save the domain to a directory on a root-squash NFS, though, the save hangs forever with a zombified child process with a WCHAN of "exit", and the parent process is sitting on recvmsg():
I save the domain to a directory on a root-squash NFS, and it works fine. Restoring the domain works fine too.
Thread 4 (Thread 0x7f6045f55700 (LWP 3754)): #0 0x000000332840e9cd in recvmsg () from /lib64/libpthread.so.0 #1 0x00007f604ea7ddc9 in virFileOpenAs ( path=0x7f6038009e90 "/remote/vhost-libvirt/qemu/RHEL6.save", openflags=577, mode=432, uid=107, gid=107, flags=1) at util/util.c:1513 #2 0x00000000004416b0 in qemudDomainSaveFlag (driver=0xbb96f0, dom=0x7f60380095a0, vm=0xd219c0, path=0x7f6038009e90 "/remote/vhost-libvirt/qemu/RHEL6.save", compressed=<value optimized out>) at qemu/qemu_driver.c:1924 #3 0x0000000000441a12 in qemudDomainSave (dom=0x7f60380095a0, path=0x7f6038009e90 "/remote/vhost-libvirt/qemu/RHEL6.save") at qemu/qemu_driver.c:2143 #4 0x00007f604eadeae6 in virDomainSave (domain=0x7f60380095a0, to=0x7f6038009e90 "/remote/vhost-libvirt/qemu/RHEL6.save") at libvirt.c:2280
I get a similar hang when I try to restore from an image saved on root-squash NFS.
(if it makes any difference, this is on F13, with qemu-kvm-0.13.0-1)
Do you want me to try bisecting it? Access to the machine?
Eric Blake (15): qemu: use lighter-weight fd:n on incoming tunneled migration qemu: consolidate duplicated monitor migration code qemu: improve efficiency of dd during snapshots qemu: support migration to fd util: use SCM_RIGHTS in virFileOperation when needed qemu: allow simple domain save to use fd: protocol qemu: simplify domain save fd handling storage: simplify fd handling util: rename virFileOperation to virFileOpenAs util: adjust indentation in previous patch qemu, storage: improve type safety qemu: use common API for reading difficult files qemu: consolidate migration to file code qemu: skip granting access during fd migration qemu: support fd: migration with compression
src/libvirt_private.syms | 2 +- src/qemu/qemu_driver.c | 626 ++++++++++++++++------------------------- src/qemu/qemu_migration.c | 45 +-- src/qemu/qemu_monitor.c | 124 ++++++++- src/qemu/qemu_monitor.h | 4 + src/qemu/qemu_monitor_json.c | 125 +-------- src/qemu/qemu_monitor_json.h | 23 +-- src/qemu/qemu_monitor_text.c | 123 +-------- src/qemu/qemu_monitor_text.h | 23 +-- src/storage/storage_backend.c | 80 +++--- src/util/util.c | 161 ++++++++---- src/util/util.h | 15 +- 12 files changed, 544 insertions(+), 807 deletions(-)
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On 03/10/2011 03:59 AM, Wen Congyang wrote:
At 03/10/2011 01:24 PM, Laine Stump Write:
On 03/09/2011 08:45 PM, Eric Blake wrote:
Needs more testing - especially on root-squash NFS Starting up a domain with backing store on root-squash NFS works fine.
When I try to save the domain to a directory on a root-squash NFS, though, the save hangs forever with a zombified child process with a WCHAN of "exit", and the parent process is sitting on recvmsg(): I save the domain to a directory on a root-squash NFS, and it works fine.
Restoring the domain works fine too.
Okay, I've figured out what happened - the permission on a parent of my target directory on the NFS server had been changed during unrelated testing from (root,root) 755 to 700, so even though the target directory had the proper user (qemu,qemu), qemu was denied access to the directory due to its parent's permissions/ownership. When I set the parent directory back to 755, save and restore both work. However, the current code still has a problem - if you try to save to/from a directory that the qemu user doesn't have permission for (and presumably if there's any other error in the child that causes it to skip the sendmsg), rather than returning an error virFileOpenAs just hangs on recvmsg waiting for the fd info to come over the pipe, while the child process sits around as a zombie because nobody has reaped its exit status.

On 03/10/2011 12:54 PM, Laine Stump wrote:
Okay, I've figured out what happened - the permission on a parent of my target directory on the NFS server had been changed during unrelated testing from (root,root) 755 to 700, so even though the target directory had the proper user (qemu,qemu), qemu was denied access to the directory due to its parent's permissions/ownership. When I set the parent directory back to 755, save and restore both work.
However, the current code still has a problem - if you try to save to/from a directory that the qemu user doesn't have permission for (and presumably if there's any other error in the child that causes it to skip the sendmsg), rather than returning an error virFileOpenAs just hangs on recvmsg waiting for the fd info to come over the pipe, while the child process sits around as a zombie because nobody has reaped its exit status.
Yep, makes sense. I'm double checking whether I reliably closed the write end of the socket in all paths, and it may boil down to changing the child to _always_ sendmsg even if there is no fd to send. I'm still thinking about it. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

At 03/11/2011 04:01 AM, Eric Blake Write:
On 03/10/2011 12:54 PM, Laine Stump wrote:
Okay, I've figured out what happened - the permission on a parent of my target directory on the NFS server had been changed during unrelated testing from (root,root) 755 to 700, so even though the target directory had the proper user (qemu,qemu), qemu was denied access to the directory due to its parent's permissions/ownership. When I set the parent directory back to 755, save and restore both work.
However, the current code still has a problem - if you try to save to/from a directory that the qemu user doesn't have permission for (and presumably if there's any other error in the child that causes it to skip the sendmsg), rather than returning an error virFileOpenAs just hangs on recvmsg waiting for the fd info to come over the pipe, while the child process sits around as a zombie because nobody has reaped its exit status.
Yep, makes sense. I'm double checking whether I reliably closed the write end of the socket in all paths, and it may boil down to changing the child to _always_ sendmsg even if there is no fd to send. I'm still thinking about it.
Using SOCK_STREAM instead of SOCK_DGRAM to create socketpair can resolve this problem. But I don't know why.

On 03/11/2011 05:30 AM, Wen Congyang wrote:
Yep, makes sense. I'm double checking whether I reliably closed the write end of the socket in all paths, and it may boil down to changing the child to_always_ sendmsg even if there is no fd to send. I'm still thinking about it.
Using SOCK_STREAM instead of SOCK_DGRAM to create socketpair can resolve this problem. But I don't know why.
At least with AF_INET that would be obvious: with SOCK_STRAM exiting the child will close the connection. With a datagram socket, the parent can still wait for other messages to the same UDP port. I have no idea whether things work the same for AF_UNIX, but apparently they do. :) Paolo

On 03/11/2011 05:15 AM, Paolo Bonzini wrote:
On 03/11/2011 05:30 AM, Wen Congyang wrote:
Yep, makes sense. I'm double checking whether I reliably closed the write end of the socket in all paths, and it may boil down to changing the child to_always_ sendmsg even if there is no fd to send. I'm still thinking about it.
Using SOCK_STREAM instead of SOCK_DGRAM to create socketpair can resolve this problem. But I don't know why.
At least with AF_INET that would be obvious: with SOCK_STRAM exiting the child will close the connection. With a datagram socket, the parent can still wait for other messages to the same UDP port.
I have no idea whether things work the same for AF_UNIX, but apparently they do. :)
Not only does this fix the problem, the other uses of socketpair() in libvirt also use SOCKET_STREAM instead of SOCKET_DGRAM. So I'd say this is an acceptable fix.

At 03/11/2011 03:54 AM, Laine Stump Write:
On 03/10/2011 03:59 AM, Wen Congyang wrote:
At 03/10/2011 01:24 PM, Laine Stump Write:
On 03/09/2011 08:45 PM, Eric Blake wrote:
Needs more testing - especially on root-squash NFS Starting up a domain with backing store on root-squash NFS works fine.
When I try to save the domain to a directory on a root-squash NFS, though, the save hangs forever with a zombified child process with a WCHAN of "exit", and the parent process is sitting on recvmsg(): I save the domain to a directory on a root-squash NFS, and it works fine.
Restoring the domain works fine too.
Okay, I've figured out what happened - the permission on a parent of my target directory on the NFS server had been changed during unrelated testing from (root,root) 755 to 700, so even though the target directory had the proper user (qemu,qemu), qemu was denied access to the directory due to its parent's permissions/ownership. When I set the parent directory back to 755, save and restore both work.
Yes. I can reproduce this problem when setting the parent directory to 700.
However, the current code still has a problem - if you try to save to/from a directory that the qemu user doesn't have permission for (and presumably if there's any other error in the child that causes it to skip the sendmsg), rather than returning an error virFileOpenAs just hangs on recvmsg waiting for the fd info to come over the pipe, while the child process sits around as a zombie because nobody has reaped its exit status.
participants (6)
-
Chris Wright
-
Daniel P. Berrange
-
Eric Blake
-
Laine Stump
-
Paolo Bonzini
-
Wen Congyang