[libvirt] [PATCH 0/8] Avoid filesystem cache pollution during virsh save

Saving a domain's state creates a large file, which risks polluting the filesystem cache and slowing down a system. If a system has a lot of domians simultaneously being saved (such as the libvirt-guests init script doing managed saves), then this can cause noticeable slowdown due to filesystem thrashing. This patch series has been successfully tested to do 'virsh save dom file --direct', with lsof(1) used to verify that O_DIRECT was in use, and the resulting file was successfully used with 'virsh restore file'. Still to come - wire up O_DIRECT on the 'virsh restore' path (yes, that means adding virDomainRestoreFlags - if only we had had the foresight to use flags everywhere). Wire up qemu.conf to allow the automatic use of --direct on automatic core dumps. Wire up libvirt-guests init script to allow the use of --direct. But I had enough in place to get the review started now. Also, this series demonstrates some of the points I was making about adding a new Flags API in this thread: https://www.redhat.com/archives/libvir-list/2011-July/msg00762.html This series requires and was tested on top of these (un-acked) patches: https://www.redhat.com/archives/libvir-list/2011-July/msg00670.html https://www.redhat.com/archives/libvir-list/2011-July/msg00675.html Eric Blake (8): save: document new public API save: wire up remote protocol save: wire up trivial saveFlags implementations save: add --direct flag to virsh save operations save: let iohelper handle inherited fd save: let iohelper work on O_DIRECT fds save: add virDirectFd wrapper type save: support O_DIRECT during qemu saves cfg.mk | 1 + configure.ac | 6 +- include/libvirt/libvirt.h.in | 14 +++ po/POTFILES.in | 1 + src/Makefile.am | 1 + src/driver.h | 6 ++ src/fdstream.c | 32 +++----- src/libvirt.c | 94 +++++++++++++++++++++- src/libvirt_private.syms | 6 ++ src/libvirt_public.syms | 5 + src/libxl/libxl_driver.c | 17 ++++- src/qemu/qemu_driver.c | 76 +++++++++++++----- src/remote/remote_driver.c | 1 + src/remote/remote_protocol.x | 11 +++- src/remote_protocol-structs | 6 ++ src/test/test_driver.c | 20 ++++- src/util/iohelper.c | 185 +++++++++++++++++++++++++++++++---------- src/util/virdirect.c | 149 +++++++++++++++++++++++++++++++++ src/util/virdirect.h | 37 +++++++++ src/vbox/vbox_tmpl.c | 19 ++++- src/xen/xen_driver.c | 17 ++++- tools/virsh.c | 12 +++- tools/virsh.pod | 17 +++- 23 files changed, 632 insertions(+), 101 deletions(-) create mode 100644 src/util/virdirect.c create mode 100644 src/util/virdirect.h -- 1.7.4.4

In order to choose whether to use O_DIRECT when saving a domain image to a file, we need a new flag. But virDomainSave was implemented before our policy of all new APIs having a flag argument. * include/libvirt/libvirt.h.in (virDomainSaveFlags): New prototype. * src/libvirt.c (virDomainSaveFlags): New API. * src/libvirt_public.syms: Export it. * src/driver.h (virDrvDomainSaveFlags): New driver callback. --- include/libvirt/libvirt.h.in | 14 ++++++ src/driver.h | 6 +++ src/libvirt.c | 94 +++++++++++++++++++++++++++++++++++++++++- src/libvirt_public.syms | 5 ++ 4 files changed, 118 insertions(+), 1 deletions(-) diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index d5a7105..d9a8694 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -659,6 +659,7 @@ typedef virDomainMemoryStatStruct *virDomainMemoryStatPtr; typedef enum { VIR_DUMP_CRASH = (1 << 0), /* crash after dump */ VIR_DUMP_LIVE = (1 << 1), /* live dump */ + VIR_DUMP_DIRECT = (1 << 2), /* Use O_DIRECT while saving */ } virDomainCoreDumpFlags; /* Domain migration flags. */ @@ -941,8 +942,21 @@ int virDomainResume (virDomainPtr domain); /* * Domain save/restore */ + +/** + * virDomainSaveFlagValues: + * Flags for use in virDomainSaveFlags and virDomainManagedSave. + */ +typedef enum { + VIR_DOMAIN_SAVE_DIRECT = 1 << 0, /* Use O_DIRECT while saving */ +} virDomainSaveFlagValues; + int virDomainSave (virDomainPtr domain, const char *to); +int virDomainSaveFlags (virDomainPtr domain, + const char *to, + const char *dxml, + unsigned int flags); int virDomainRestore (virConnectPtr conn, const char *from); diff --git a/src/driver.h b/src/driver.h index 70ea4c2..29d4df1 100644 --- a/src/driver.h +++ b/src/driver.h @@ -178,6 +178,11 @@ typedef int (*virDrvDomainSave) (virDomainPtr domain, const char *to); typedef int + (*virDrvDomainSaveFlags) (virDomainPtr domain, + const char *to, + const char *dxml, + unsigned int flags); +typedef int (*virDrvDomainRestore) (virConnectPtr conn, const char *from); typedef int @@ -710,6 +715,7 @@ struct _virDriver { virDrvDomainGetState domainGetState; virDrvDomainGetControlInfo domainGetControlInfo; virDrvDomainSave domainSave; + virDrvDomainSaveFlags domainSaveFlags; virDrvDomainRestore domainRestore; virDrvDomainCoreDump domainCoreDump; virDrvDomainScreenshot domainScreenshot; diff --git a/src/libvirt.c b/src/libvirt.c index fc17f2b..d280eb7 100644 --- a/src/libvirt.c +++ b/src/libvirt.c @@ -2242,6 +2242,8 @@ error: * listed as running anymore (this may be a problem). * Use virDomainRestore() to restore a domain after saving. * + * See virDomainSaveFlags() for more control. + * * Returns 0 in case of success and -1 in case of failure. */ int @@ -2296,6 +2298,84 @@ error: } /** + * virDomainSaveFlags: + * @domain: a domain object + * @to: path for the output file + * @dxml: (optional) XML config for adjusting guest xml used on restore + * @flags: bitwise-OR of virDomainSaveFlagValues + * + * This method will suspend a domain and save its memory contents to + * a file on disk. After the call, if successful, the domain is not + * listed as running anymore (this may be a problem). + * Use virDomainRestore() to restore a domain after saving. + * + * If the hypervisor supports it, @dxml can be used to alter + * host-specific portions of the domain XML that will be used when + * restoring an image. For example, it is possible to alter the + * backing filename that is associated with a disk device, in order to + * prepare for file renaming done as part of backing up the disk + * device while the domain is stopped. + * + * If @flags includes VIR_DOMAIN_SAVE_DIRECT, then libvirt will attempt + * to use O_DIRECT I/O while creating the file; this can allow less + * pressure on file system cache, but also risks slowing saves to NFS. + * + * Returns 0 in case of success and -1 in case of failure. + */ +int +virDomainSaveFlags(virDomainPtr domain, const char *to, + const char *dxml, unsigned int flags) +{ + virConnectPtr conn; + + VIR_DOMAIN_DEBUG(domain, "to=%s dxml=%s flags=%x", + to, NULLSTR(dxml), flags); + + virResetLastError(); + + if (!VIR_IS_CONNECTED_DOMAIN(domain)) { + virLibDomainError(VIR_ERR_INVALID_DOMAIN, __FUNCTION__); + virDispatchError(NULL); + return -1; + } + if (domain->conn->flags & VIR_CONNECT_RO) { + virLibDomainError(VIR_ERR_OPERATION_DENIED, __FUNCTION__); + goto error; + } + conn = domain->conn; + if (to == NULL) { + virLibDomainError(VIR_ERR_INVALID_ARG, __FUNCTION__); + goto error; + } + + if (conn->driver->domainSaveFlags) { + int ret; + char *absolute_to; + + /* We must absolutize the file path as the save is done out of process */ + if (virFileAbsPath(to, &absolute_to) < 0) { + virLibConnError(VIR_ERR_INTERNAL_ERROR, + _("could not build absolute output file path")); + goto error; + } + + ret = conn->driver->domainSaveFlags(domain, absolute_to, dxml, flags); + + VIR_FREE(absolute_to); + + if (ret < 0) + goto error; + return ret; + } + + virLibConnError(VIR_ERR_NO_SUPPORT, __FUNCTION__); + +error: + virDispatchError(domain->conn); + return -1; +} + +/** * virDomainRestore: * @conn: pointer to the hypervisor connection * @from: path to the input file @@ -2362,6 +2442,14 @@ error: * Note that for remote Xen Daemon the file path will be interpreted in * the remote host. * + * If @flags includes VIR_DUMP_CRASH, then leave the guest shut off with + * a crashed state after the dump completes. If @flags includes + * VIR_DUMP_LIVE, then make the core dump while continuing to allow + * the guest to run; otherwise, the guest is suspended during the dump. + * If @flags includes VIR_DUMP_DIRECT, then libvirt will attempt + * to use O_DIRECT I/O while creating the file; this can allow less + * pressure on file system cache, but also risks slowing saves to NFS. + * * Returns 0 in case of success and -1 in case of failure. */ int @@ -14811,7 +14899,7 @@ error: /** * virDomainManagedSave: * @dom: pointer to the domain - * @flags: optional flags currently unused + * @flags: bitwise-OR of virDomainSaveFlagValues * * This method will suspend a domain and save its memory contents to * a file on disk. After the call, if successful, the domain is not @@ -14821,6 +14909,10 @@ error: * restarted (automatically or via an explicit libvirt call). * As a result any running domain is sure to not have a managed saved image. * + * If @flags includes VIR_DOMAIN_SAVE_DIRECT, then libvirt will attempt + * to use O_DIRECT I/O while creating the file; this can allow less + * pressure on file system cache, but also risks slowing saves to NFS. + * * Returns 0 in case of success or -1 in case of failure */ int virDomainManagedSave(virDomainPtr dom, unsigned int flags) diff --git a/src/libvirt_public.syms b/src/libvirt_public.syms index 5f2541a..0d31b94 100644 --- a/src/libvirt_public.syms +++ b/src/libvirt_public.syms @@ -466,4 +466,9 @@ LIBVIRT_0.9.3 { virNodeGetMemoryStats; } LIBVIRT_0.9.2; +LIBVIRT_0.9.4 { + global: + virDomainSaveFlags; +} LIBVIRT_0.9.3; + # .... define new API here using predicted next version number .... -- 1.7.4.4

On 07/14/2011 08:24 AM, Eric Blake wrote:
In order to choose whether to use O_DIRECT when saving a domain image to a file, we need a new flag. But virDomainSave was implemented before our policy of all new APIs having a flag argument.
/** + * virDomainSaveFlags: + * @domain: a domain object + * @to: path for the output file + * @dxml: (optional) XML config for adjusting guest xml used on restore + * @flags: bitwise-OR of virDomainSaveFlagValues + * + * This method will suspend a domain and save its memory contents to + * a file on disk. After the call, if successful, the domain is not + * listed as running anymore (this may be a problem). + * Use virDomainRestore() to restore a domain after saving. + * + * If the hypervisor supports it, @dxml can be used to alter + * host-specific portions of the domain XML that will be used when + * restoring an image. For example, it is possible to alter the + * backing filename that is associated with a disk device, in order to + * prepare for file renaming done as part of backing up the disk + * device while the domain is stopped.
Oh, I forgot to mention in my cover letter that this is also still on my to-do queue, by reusing the work in the virDomainMigrate2 series to allow compatible changes to the XML (oh, and @dxml is undocumented in virDomainMigrate2 at the moment). Likewise, I plan to add a @dxml argument to virDomainRestoreFlags (some people know in advance what changes they plan to make to the xml before saving, others won't know until restore time what changes have to be accommodated), as well as two new API: /** * virConnectDomainSaveGetXMLDesc: * @conn: a connection object * @file: a file created by virDomainSave[Flags] * @flags: currently unused, pass 0 * * Extract the XML from the saved state file that corresponds to the * domain at the time it was saved. * * Returns the XML, or NULL in case of error. The caller must free() * the returned value. */ char *virConnectDomainSaveGetXMLDesc(virConnectPtr conn, const char *file, unsigned int flags); /** * virConnectDomainSaveModify: * @conn: a connection object * @file: a file created by virDomainSave[Flags] * @dxml: the new XML description to use * @flags: currently unused, pass 0 * * If the hypervisor supports it, @dxml can be used to alter * host-specific portions of the domain XML that will be used when * restoring an image. For example, it is possible to alter the * backing filename that is associated with a disk device, in order * to manage file renames due to snapshots or file backups done * after the domain state was saved. * * Returns 0 on success, -1 on error. */ int virConnectDomainSaveModify(virConnectPtr conn, const char *file, const char *dxml, unsigned int flags); -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On Thu, Jul 14, 2011 at 08:24:28AM -0600, Eric Blake wrote:
In order to choose whether to use O_DIRECT when saving a domain image to a file, we need a new flag. But virDomainSave was implemented before our policy of all new APIs having a flag argument.
* include/libvirt/libvirt.h.in (virDomainSaveFlags): New prototype. * src/libvirt.c (virDomainSaveFlags): New API. * src/libvirt_public.syms: Export it. * src/driver.h (virDrvDomainSaveFlags): New driver callback. --- include/libvirt/libvirt.h.in | 14 ++++++ src/driver.h | 6 +++ src/libvirt.c | 94 +++++++++++++++++++++++++++++++++++++++++- src/libvirt_public.syms | 5 ++ 4 files changed, 118 insertions(+), 1 deletions(-)
diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index d5a7105..d9a8694 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -659,6 +659,7 @@ typedef virDomainMemoryStatStruct *virDomainMemoryStatPtr; typedef enum { VIR_DUMP_CRASH = (1 << 0), /* crash after dump */ VIR_DUMP_LIVE = (1 << 1), /* live dump */ + VIR_DUMP_DIRECT = (1 << 2), /* Use O_DIRECT while saving */
I wonder if we're better off calling this 'BYPASS_CACHE' or something like since, since O_DIRECT is just an impl detail.
+/** + * virDomainSaveFlagValues: + * Flags for use in virDomainSaveFlags and virDomainManagedSave. + */ +typedef enum { + VIR_DOMAIN_SAVE_DIRECT = 1 << 0, /* Use O_DIRECT while saving */ +} virDomainSaveFlagValues;
Same question
+ int virDomainSave (virDomainPtr domain, const char *to); +int virDomainSaveFlags (virDomainPtr domain, + const char *to, + const char *dxml, + unsigned int flags);
ACK to new API design. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 07/19/2011 10:10 AM, Daniel P. Berrange wrote:
On Thu, Jul 14, 2011 at 08:24:28AM -0600, Eric Blake wrote:
In order to choose whether to use O_DIRECT when saving a domain image to a file, we need a new flag. But virDomainSave was implemented before our policy of all new APIs having a flag argument.
* include/libvirt/libvirt.h.in (virDomainSaveFlags): New prototype. * src/libvirt.c (virDomainSaveFlags): New API. * src/libvirt_public.syms: Export it. * src/driver.h (virDrvDomainSaveFlags): New driver callback. --- include/libvirt/libvirt.h.in | 14 ++++++ src/driver.h | 6 +++ src/libvirt.c | 94 +++++++++++++++++++++++++++++++++++++++++- src/libvirt_public.syms | 5 ++ 4 files changed, 118 insertions(+), 1 deletions(-)
diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index d5a7105..d9a8694 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -659,6 +659,7 @@ typedef virDomainMemoryStatStruct *virDomainMemoryStatPtr; typedef enum { VIR_DUMP_CRASH = (1<< 0), /* crash after dump */ VIR_DUMP_LIVE = (1<< 1), /* live dump */ + VIR_DUMP_DIRECT = (1<< 2), /* Use O_DIRECT while saving */
I wonder if we're better off calling this 'BYPASS_CACHE' or something like since, since O_DIRECT is just an impl detail.
Absolutely a good idea! I'll fold that rename into v2, particularly since, if posix_fadvise ever gets smarter on Linux, then we can bypass the cache without using O_DIRECT. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

* src/remote/remote_driver.c (remote_driver): Add new callback. * src/remote/remote_protocol.x (remote_procedure): New RPC. (remote_domain_save_flags_args): New struct. * src/remote_protocol-structs: Update. --- src/remote/remote_driver.c | 1 + src/remote/remote_protocol.x | 11 ++++++++++- src/remote_protocol-structs | 6 ++++++ 3 files changed, 17 insertions(+), 1 deletions(-) diff --git a/src/remote/remote_driver.c b/src/remote/remote_driver.c index 2d5dc15..b90fb08 100644 --- a/src/remote/remote_driver.c +++ b/src/remote/remote_driver.c @@ -4156,6 +4156,7 @@ static virDriver remote_driver = { .domainGetState = remoteDomainGetState, /* 0.9.2 */ .domainGetControlInfo = remoteDomainGetControlInfo, /* 0.9.3 */ .domainSave = remoteDomainSave, /* 0.3.0 */ + .domainSaveFlags = remoteDomainSaveFlags, /* 0.9.4 */ .domainRestore = remoteDomainRestore, /* 0.3.0 */ .domainCoreDump = remoteDomainCoreDump, /* 0.3.0 */ .domainScreenshot = remoteDomainScreenshot, /* 0.9.2 */ diff --git a/src/remote/remote_protocol.x b/src/remote/remote_protocol.x index ee169fd..b0b4f6f 100644 --- a/src/remote/remote_protocol.x +++ b/src/remote/remote_protocol.x @@ -725,6 +725,13 @@ struct remote_domain_save_args { remote_nonnull_string to; }; +struct remote_domain_save_flags_args { + remote_nonnull_domain dom; + remote_nonnull_string to; + remote_string dxml; + unsigned int flags; +}; + struct remote_domain_restore_args { remote_nonnull_string from; }; @@ -2383,7 +2390,9 @@ enum remote_procedure { REMOTE_PROC_NODE_GET_CPU_STATS = 227, /* skipgen skipgen */ REMOTE_PROC_NODE_GET_MEMORY_STATS = 228, /* skipgen skipgen */ REMOTE_PROC_DOMAIN_GET_CONTROL_INFO = 229, /* autogen autogen */ - REMOTE_PROC_DOMAIN_GET_VCPU_PIN_INFO = 230 /* skipgen skipgen */ + REMOTE_PROC_DOMAIN_GET_VCPU_PIN_INFO = 230, /* skipgen skipgen */ + + REMOTE_PROC_DOMAIN_SAVE_FLAGS = 231 /* autogen autogen */ /* * Notice how the entries are grouped in sets of 10 ? diff --git a/src/remote_protocol-structs b/src/remote_protocol-structs index b2de8e9..a24fed9 100644 --- a/src/remote_protocol-structs +++ b/src/remote_protocol-structs @@ -416,6 +416,12 @@ struct remote_domain_save_args { remote_nonnull_domain dom; remote_nonnull_string to; }; +struct remote_domain_save_flags_args { + remote_nonnull_domain dom; + remote_nonnull_string to; + remote_string dxml; + u_int flags; +}; struct remote_domain_restore_args { remote_nonnull_string from; }; -- 1.7.4.4

On 07/14/2011 08:24 AM, Eric Blake wrote:
* src/remote/remote_driver.c (remote_driver): Add new callback. * src/remote/remote_protocol.x (remote_procedure): New RPC. (remote_domain_save_flags_args): New struct. * src/remote_protocol-structs: Update. --- src/remote/remote_driver.c | 1 + src/remote/remote_protocol.x | 11 ++++++++++- src/remote_protocol-structs | 6 ++++++ 3 files changed, 17 insertions(+), 1 deletions(-)
Squash this in if my on-the-wire enum verification goes in first: https://www.redhat.com/archives/libvir-list/2011-July/msg00862.html diff --git i/src/remote_protocol-structs w/src/remote_protocol-structs index e20b741..8e90287 100644 --- i/src/remote_protocol-structs +++ w/src/remote_protocol-structs @@ -1865,4 +1865,5 @@ enum remote_procedure { REMOTE_PROC_NODE_GET_MEMORY_STATS = 228, REMOTE_PROC_DOMAIN_GET_CONTROL_INFO = 229, REMOTE_PROC_DOMAIN_GET_VCPU_PIN_INFO = 230, + REMOTE_PROC_DOMAIN_SAVE_FLAGS = 231, }; -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On Thu, Jul 14, 2011 at 08:24:29AM -0600, Eric Blake wrote:
* src/remote/remote_driver.c (remote_driver): Add new callback. * src/remote/remote_protocol.x (remote_procedure): New RPC. (remote_domain_save_flags_args): New struct. * src/remote_protocol-structs: Update. --- src/remote/remote_driver.c | 1 + src/remote/remote_protocol.x | 11 ++++++++++- src/remote_protocol-structs | 6 ++++++ 3 files changed, 17 insertions(+), 1 deletions(-)
ACK, trivial mapping of public API Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

For all hypervisors that support save, the new API now performs the same functions as the old. * src/libxl/libxl_driver.c (libxlDomainSave): Move guts... (libxlDomainSaveFlags): ...to new function. * src/test/test_driver.c (testDomainSave, testDomainSaveFlags): Likewise. * src/vbox/vbox_tmpl.c (vboxDomainSave, vboxDomainSaveFlags): Likewise. * src/xen/xen_driver.c (xenUnifiedDomainSave) (xenUnifiedDomainSaveFlags): Likewise. * src/qemu/qemu_driver.c (qemudDomainSave): Rename and move guts. (qemuDomainSave, qemuDomainSaveFlags): ...here. (qemudDomainSaveFlag): Rename... (qemuDomainSaveInternal): ...to this, and update callers. --- src/libxl/libxl_driver.c | 17 ++++++++++++++++- src/qemu/qemu_driver.c | 31 ++++++++++++++++++++++++------- src/test/test_driver.c | 20 ++++++++++++++++++-- src/vbox/vbox_tmpl.c | 19 ++++++++++++++++++- src/xen/xen_driver.c | 17 ++++++++++++++++- 5 files changed, 92 insertions(+), 12 deletions(-) diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c index 2bc998d..a88d0dc 100644 --- a/src/libxl/libxl_driver.c +++ b/src/libxl/libxl_driver.c @@ -1906,12 +1906,20 @@ cleanup: } static int -libxlDomainSave(virDomainPtr dom, const char *to) +libxlDomainSaveFlags(virDomainPtr dom, const char *to, const char *dxml, + unsigned int flags) { libxlDriverPrivatePtr driver = dom->conn->privateData; virDomainObjPtr vm; int ret = -1; + virCheckFlags(0, -1); + if (dxml) { + libxlError(VIR_ERR_INVALID_ARG, "%s", + _("xml modification unsupported")); + return -1; + } + libxlDriverLock(driver); vm = virDomainFindByUUID(&driver->domains, dom->uuid); @@ -1938,6 +1946,12 @@ cleanup: } static int +libxlDomainSave(virDomainPtr dom, const char *to) +{ + return libxlDomainSaveFlags(dom, to, NULL, 0); +} + +static int libxlDomainRestore(virConnectPtr conn, const char *from) { libxlDriverPrivatePtr driver = conn->privateData; @@ -3822,6 +3836,7 @@ static virDriver libxlDriver = { .domainGetInfo = libxlDomainGetInfo, /* 0.9.0 */ .domainGetState = libxlDomainGetState, /* 0.9.2 */ .domainSave = libxlDomainSave, /* 0.9.2 */ + .domainSaveFlags = libxlDomainSaveFlags, /* 0.9.4 */ .domainRestore = libxlDomainRestore, /* 0.9.2 */ .domainCoreDump = libxlDomainCoreDump, /* 0.9.2 */ .domainSetVcpus = libxlDomainSetVcpus, /* 0.9.0 */ diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index f33abab..ceeef1c 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -2107,9 +2107,10 @@ qemuCompressProgramName(int compress) * shutdown). So 'vm' must not be referenced by the caller after * this returns (whether returning success or failure). */ -static int qemudDomainSaveFlag(struct qemud_driver *driver, virDomainPtr dom, - virDomainObjPtr vm, const char *path, - int compressed) +static int +qemuDomainSaveInternal(struct qemud_driver *driver, virDomainPtr dom, + virDomainObjPtr vm, const char *path, + int compressed) { char *xml = NULL; struct qemud_save_header header; @@ -2351,13 +2352,22 @@ static bool qemudCompressProgramAvailable(enum qemud_save_formats compress) return true; } -static int qemudDomainSave(virDomainPtr dom, const char *path) +static int +qemuDomainSaveFlags(virDomainPtr dom, const char *path, const char *dxml, + unsigned int flags) { struct qemud_driver *driver = dom->conn->privateData; int compressed; int ret = -1; virDomainObjPtr vm = NULL; + virCheckFlags(0, -1); + if (dxml) { + qemuReportError(VIR_ERR_INVALID_ARG, "%s", + _("xml modification unsupported")); + return -1; + } + qemuDriverLock(driver); if (driver->saveImageFormat == NULL) @@ -2393,7 +2403,7 @@ static int qemudDomainSave(virDomainPtr dom, const char *path) goto cleanup; } - ret = qemudDomainSaveFlag(driver, dom, vm, path, compressed); + ret = qemuDomainSaveInternal(driver, dom, vm, path, compressed); vm = NULL; cleanup: @@ -2404,6 +2414,12 @@ cleanup: return ret; } +static int +qemuDomainSave(virDomainPtr dom, const char *path) +{ + return qemuDomainSaveFlags(dom, path, NULL, 0); +} + static char * qemuDomainManagedSavePath(struct qemud_driver *driver, virDomainObjPtr vm) { char *ret; @@ -2450,7 +2466,7 @@ qemuDomainManagedSave(virDomainPtr dom, unsigned int flags) VIR_INFO("Saving state to %s", name); compressed = QEMUD_SAVE_FORMAT_RAW; - ret = qemudDomainSaveFlag(driver, dom, vm, name, compressed); + ret = qemuDomainSaveInternal(driver, dom, vm, name, compressed); vm = NULL; cleanup: @@ -8578,7 +8594,8 @@ static virDriver qemuDriver = { .domainGetInfo = qemudDomainGetInfo, /* 0.2.0 */ .domainGetState = qemuDomainGetState, /* 0.9.2 */ .domainGetControlInfo = qemuDomainGetControlInfo, /* 0.9.3 */ - .domainSave = qemudDomainSave, /* 0.2.0 */ + .domainSave = qemuDomainSave, /* 0.2.0 */ + .domainSaveFlags = qemuDomainSaveFlags, /* 0.9.4 */ .domainRestore = qemuDomainRestore, /* 0.2.0 */ .domainCoreDump = qemudDomainCoreDump, /* 0.7.0 */ .domainScreenshot = qemuDomainScreenshot, /* 0.9.2 */ diff --git a/src/test/test_driver.c b/src/test/test_driver.c index eb800f5..665a344 100644 --- a/src/test/test_driver.c +++ b/src/test/test_driver.c @@ -1726,8 +1726,9 @@ cleanup: #define TEST_SAVE_MAGIC "TestGuestMagic" -static int testDomainSave(virDomainPtr domain, - const char *path) +static int +testDomainSaveFlags(virDomainPtr domain, const char *path, + const char *dxml, unsigned int flags) { testConnPtr privconn = domain->conn->privateData; char *xml = NULL; @@ -1737,6 +1738,13 @@ static int testDomainSave(virDomainPtr domain, virDomainEventPtr event = NULL; int ret = -1; + virCheckFlags(0, -1); + if (dxml) { + testError(VIR_ERR_INVALID_ARG, "%s", + _("xml modification unsupported")); + return -1; + } + testDriverLock(privconn); privdom = virDomainFindByName(&privconn->domains, domain->name); @@ -1820,6 +1828,13 @@ cleanup: return ret; } +static int +testDomainSave(virDomainPtr domain, + const char *path) +{ + return testDomainSaveFlags(domain, path, NULL, 0); +} + static int testDomainRestore(virConnectPtr conn, const char *path) { @@ -5543,6 +5558,7 @@ static virDriver testDriver = { .domainGetInfo = testGetDomainInfo, /* 0.1.1 */ .domainGetState = testDomainGetState, /* 0.9.2 */ .domainSave = testDomainSave, /* 0.3.2 */ + .domainSaveFlags = testDomainSaveFlags, /* 0.9.4 */ .domainRestore = testDomainRestore, /* 0.3.2 */ .domainCoreDump = testDomainCoreDump, /* 0.3.2 */ .domainSetVcpus = testSetVcpus, /* 0.1.4 */ diff --git a/src/vbox/vbox_tmpl.c b/src/vbox/vbox_tmpl.c index 79423ee..c81d1a9 100644 --- a/src/vbox/vbox_tmpl.c +++ b/src/vbox/vbox_tmpl.c @@ -1979,13 +1979,23 @@ cleanup: return ret; } -static int vboxDomainSave(virDomainPtr dom, const char *path ATTRIBUTE_UNUSED) { +static int +vboxDomainSaveFlags(virDomainPtr dom, const char *path ATTRIBUTE_UNUSED, + const char *dxml, unsigned int flags) +{ VBOX_OBJECT_CHECK(dom->conn, int, -1); IConsole *console = NULL; vboxIID iid = VBOX_IID_INITIALIZER; IMachine *machine = NULL; nsresult rc; + virCheckFlags(0, -1); + if (dxml) { + vboxError(VIR_ERR_INVALID_ARG, "%s", + _("xml modification unsupported")); + return -1; + } + /* VirtualBox currently doesn't support saving to a file * at a location other then the machine folder and thus * setting path to ATTRIBUTE_UNUSED for now, will change @@ -2040,6 +2050,12 @@ static int vboxDomainSave(virDomainPtr dom, const char *path ATTRIBUTE_UNUSED) { } static int +vboxDomainSave(virDomainPtr dom, const char *path) +{ + return vboxDomainSaveFlags(dom, path, NULL, 0); +} + +static int vboxDomainSetVcpusFlags(virDomainPtr dom, unsigned int nvcpus, unsigned int flags) { @@ -8781,6 +8797,7 @@ virDriver NAME(Driver) = { .domainGetInfo = vboxDomainGetInfo, /* 0.6.3 */ .domainGetState = vboxDomainGetState, /* 0.9.2 */ .domainSave = vboxDomainSave, /* 0.6.3 */ + .domainSaveFlags = vboxDomainSaveFlags, /* 0.9.4 */ .domainSetVcpus = vboxDomainSetVcpus, /* 0.7.1 */ .domainSetVcpusFlags = vboxDomainSetVcpusFlags, /* 0.8.5 */ .domainGetVcpusFlags = vboxDomainGetVcpusFlags, /* 0.8.5 */ diff --git a/src/xen/xen_driver.c b/src/xen/xen_driver.c index 83c28db..01c9c4a 100644 --- a/src/xen/xen_driver.c +++ b/src/xen/xen_driver.c @@ -1051,11 +1051,19 @@ xenUnifiedDomainGetState(virDomainPtr dom, } static int -xenUnifiedDomainSave (virDomainPtr dom, const char *to) +xenUnifiedDomainSaveFlags(virDomainPtr dom, const char *to, const char *dxml, + unsigned int flags) { GET_PRIVATE(dom->conn); int i; + virCheckFlags(0, -1); + if (dxml) { + xenUnifiedError(VIR_ERR_INVALID_ARG, "%s", + _("xml modification unsupported")); + return -1; + } + for (i = 0; i < XEN_UNIFIED_NR_DRIVERS; ++i) if (priv->opened[i] && drivers[i]->domainSave && @@ -1066,6 +1074,12 @@ xenUnifiedDomainSave (virDomainPtr dom, const char *to) } static int +xenUnifiedDomainSave(virDomainPtr dom, const char *to) +{ + return xenUnifiedDomainSaveFlags(dom, to, NULL, 0); +} + +static int xenUnifiedDomainRestore (virConnectPtr conn, const char *from) { GET_PRIVATE(conn); @@ -2200,6 +2214,7 @@ static virDriver xenUnifiedDriver = { .domainGetInfo = xenUnifiedDomainGetInfo, /* 0.0.3 */ .domainGetState = xenUnifiedDomainGetState, /* 0.9.2 */ .domainSave = xenUnifiedDomainSave, /* 0.0.3 */ + .domainSaveFlags = xenUnifiedDomainSaveFlags, /* 0.9.4 */ .domainRestore = xenUnifiedDomainRestore, /* 0.0.3 */ .domainCoreDump = xenUnifiedDomainCoreDump, /* 0.1.9 */ .domainSetVcpus = xenUnifiedDomainSetVcpus, /* 0.1.4 */ -- 1.7.4.4

On 07/14/2011 08:24 AM, Eric Blake wrote:
For all hypervisors that support save, the new API now performs the same functions as the old.
* src/libxl/libxl_driver.c (libxlDomainSave): Move guts... (libxlDomainSaveFlags): ...to new function. * src/test/test_driver.c (testDomainSave, testDomainSaveFlags): Likewise. * src/vbox/vbox_tmpl.c (vboxDomainSave, vboxDomainSaveFlags): Likewise.
VBox virDomainSave doesn't work (it doesn't honor the destination file name), and worse has no counterpart virDomainRestore (making it absolutely worthless!). I will split this patch to instead remove vbox's virDomainSave, instead of wrapping it. Or maybe we'll change vbox's save to instead serve as virDomainManagedSave. But the end result is that is should not be part of this patch. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On Thu, Jul 14, 2011 at 08:24:30AM -0600, Eric Blake wrote:
For all hypervisors that support save, the new API now performs the same functions as the old.
* src/libxl/libxl_driver.c (libxlDomainSave): Move guts... (libxlDomainSaveFlags): ...to new function. * src/test/test_driver.c (testDomainSave, testDomainSaveFlags): Likewise. * src/vbox/vbox_tmpl.c (vboxDomainSave, vboxDomainSaveFlags): Likewise. * src/xen/xen_driver.c (xenUnifiedDomainSave) (xenUnifiedDomainSaveFlags): Likewise. * src/qemu/qemu_driver.c (qemudDomainSave): Rename and move guts. (qemuDomainSave, qemuDomainSaveFlags): ...here. (qemudDomainSaveFlag): Rename... (qemuDomainSaveInternal): ...to this, and update callers. --- src/libxl/libxl_driver.c | 17 ++++++++++++++++- src/qemu/qemu_driver.c | 31 ++++++++++++++++++++++++------- src/test/test_driver.c | 20 ++++++++++++++++++-- src/vbox/vbox_tmpl.c | 19 ++++++++++++++++++- src/xen/xen_driver.c | 17 ++++++++++++++++- 5 files changed, 92 insertions(+), 12 deletions(-)
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Wire up the new flag to virsh. Also, the 'dump' command had undocumented flags. * tools/virsh.c (cmdSave, cmdManagedSave, cmdDump): Add new flag. * tools/virsh.pod (save, managedsave, dump): Document flags. --- tools/virsh.c | 12 ++++++++++-- tools/virsh.pod | 17 ++++++++++++++--- 2 files changed, 24 insertions(+), 5 deletions(-) diff --git a/tools/virsh.c b/tools/virsh.c index bd6fea7..01c5b39 100644 --- a/tools/virsh.c +++ b/tools/virsh.c @@ -1530,6 +1530,7 @@ static const vshCmdInfo info_save[] = { }; static const vshCmdOptDef opts_save[] = { + {"direct", VSH_OT_BOOL, 0, N_("use O_DIRECT when saving")}, {"domain", VSH_OT_DATA, VSH_OFLAG_REQ, N_("domain name, id or uuid")}, {"file", VSH_OT_DATA, VSH_OFLAG_REQ, N_("where to save the data")}, {NULL, 0, 0, NULL} @@ -1542,6 +1543,7 @@ cmdSave(vshControl *ctl, const vshCmd *cmd) const char *name = NULL; const char *to = NULL; bool ret = true; + bool direct = vshCommandOptBool(cmd, "direct"); if (!vshConnectionUsability(ctl, ctl->conn)) return false; @@ -1552,7 +1554,8 @@ cmdSave(vshControl *ctl, const vshCmd *cmd) if (!(dom = vshCommandOptDomain(ctl, cmd, &name))) return false; - if (virDomainSave(dom, to) == 0) { + if ((direct ? virDomainSaveFlags(dom, to, NULL, VIR_DOMAIN_SAVE_DIRECT) + : virDomainSave(dom, to)) == 0) { vshPrint(ctl, _("Domain %s saved to %s\n"), name, to); } else { vshError(ctl, _("Failed to save domain %s to %s"), name, to); @@ -1576,6 +1579,7 @@ static const vshCmdInfo info_managedsave[] = { }; static const vshCmdOptDef opts_managedsave[] = { + {"direct", VSH_OT_BOOL, 0, N_("use O_DIRECT when saving")}, {"domain", VSH_OT_DATA, VSH_OFLAG_REQ, N_("domain name, id or uuid")}, {NULL, 0, 0, NULL} }; @@ -1586,6 +1590,7 @@ cmdManagedSave(vshControl *ctl, const vshCmd *cmd) virDomainPtr dom; const char *name; bool ret = true; + bool direct = vshCommandOptBool(cmd, "direct"); if (!vshConnectionUsability(ctl, ctl->conn)) return false; @@ -1593,7 +1598,7 @@ cmdManagedSave(vshControl *ctl, const vshCmd *cmd) if (!(dom = vshCommandOptDomain(ctl, cmd, &name))) return false; - if (virDomainManagedSave(dom, 0) == 0) { + if (virDomainManagedSave(dom, direct ? VIR_DOMAIN_SAVE_DIRECT : 0) == 0) { vshPrint(ctl, _("Domain %s state saved by libvirt\n"), name); } else { vshError(ctl, _("Failed to save domain %s state"), name); @@ -1962,6 +1967,7 @@ static const vshCmdInfo info_dump[] = { static const vshCmdOptDef opts_dump[] = { {"live", VSH_OT_BOOL, 0, N_("perform a live core dump if supported")}, {"crash", VSH_OT_BOOL, 0, N_("crash the domain after core dump")}, + {"direct", VSH_OT_BOOL, 0, N_("use O_DIRECT when saving")}, {"domain", VSH_OT_DATA, VSH_OFLAG_REQ, N_("domain name, id or uuid")}, {"file", VSH_OT_DATA, VSH_OFLAG_REQ, N_("where to dump the core")}, {NULL, 0, 0, NULL} @@ -1989,6 +1995,8 @@ cmdDump(vshControl *ctl, const vshCmd *cmd) flags |= VIR_DUMP_LIVE; if (vshCommandOptBool (cmd, "crash")) flags |= VIR_DUMP_CRASH; + if (vshCommandOptBool(cmd, "direct")) + flags |= VIR_DUMP_DIRECT; if (virDomainCoreDump(dom, to, flags) == 0) { vshPrint(ctl, _("Domain %s dumped to %s\n"), name, to); diff --git a/tools/virsh.pod b/tools/virsh.pod index 1a98ec1..97e7f10 100644 --- a/tools/virsh.pod +++ b/tools/virsh.pod @@ -471,9 +471,16 @@ named by I<format> to a domain XML format. Convert the file I<xml> in domain XML format to the native guest configuration format named by I<format>. -=item B<dump> I<domain-id> I<corefilepath> +=item B<dump> I<domain-id> I<corefilepath> [I<--live>] [I<--crash>] +[I<--direct>] Dumps the core of a domain to a file for analysis. +If I<--live> is specified, the domain continues to run until the core +dump is complete, rather than pausing up front. +If I<--crash> is specified, the domain is halted with a crashed status, +rather than merely left in a paused state. +If I<--direct> is specified, the save uses O_DIRECT, which reduces +file system cache pressure but may slow down the operation. =item B<dumpxml> I<domain-id> optional I<--inactive> I<--security-info> I<--update-cpu> @@ -507,11 +514,13 @@ except that it does some error checking. The editor used can be supplied by the C<$VISUAL> or C<$EDITOR> environment variables, and defaults to C<vi>. -=item B<managedsave> I<domain-id> +=item B<managedsave> I<domain-id> [I<--direct>] Save and destroy (stop) a running domain, so it can be restarted from the same state at a later time. When the virsh B<start> command is next run for the domain, it will automatically be started from this saved state. +If I<--direct> is specified, the save uses O_DIRECT, which reduces +file system cache pressure but may slow down the operation. =item B<managedsave-remove> I<domain-id> @@ -590,13 +599,15 @@ should not reuse the saved state file for a second B<restore> unless you have also reverted all storage volumes back to the same contents as when the state file was created. -=item B<save> I<domain-id> I<state-file> +=item B<save> I<domain-id> I<state-file> [I<--direct>] Saves a running domain (RAM, but not disk state) to a state file so that it can be restored later. Once saved, the domain will no longer be running on the system, thus the memory allocated for the domain will be free for other domains to use. B<virsh restore> restores from this state file. +If I<--direct> is specified, the save uses O_DIRECT, which reduces +file system cache pressure but may slow down the operation. This is roughly equivalent to doing a hibernate on a running computer, with all the same limitations. Open network connections may be -- 1.7.4.4

On Thu, Jul 14, 2011 at 08:24:31AM -0600, Eric Blake wrote:
Wire up the new flag to virsh. Also, the 'dump' command had undocumented flags.
* tools/virsh.c (cmdSave, cmdManagedSave, cmdDump): Add new flag. * tools/virsh.pod (save, managedsave, dump): Document flags. --- tools/virsh.c | 12 ++++++++++-- tools/virsh.pod | 17 ++++++++++++++--- 2 files changed, 24 insertions(+), 5 deletions(-)
diff --git a/tools/virsh.c b/tools/virsh.c index bd6fea7..01c5b39 100644 --- a/tools/virsh.c +++ b/tools/virsh.c @@ -1530,6 +1530,7 @@ static const vshCmdInfo info_save[] = { };
static const vshCmdOptDef opts_save[] = { + {"direct", VSH_OT_BOOL, 0, N_("use O_DIRECT when saving")},
My question about flag naming also applies here
- if (virDomainSave(dom, to) == 0) { + if ((direct ? virDomainSaveFlags(dom, to, NULL, VIR_DOMAIN_SAVE_DIRECT) + : virDomainSave(dom, to)) == 0) {
Nit pick, for checking return values I prefr '< 0' or '>= 0' rather than '== 0' Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Rather than making the iohelper subject to a race in reopening the file, it is nicer to pass an already-open fd by inheritance. The old synopsis form must continue to work - if someone updates their libvirt package and installs a new libvirt_iohelper but without restarting the old libvirtd daemon, then the daemon can still make calls using the old syntax but the new iohelper. * src/util/iohelper.c (runIO): Split code for open... (prepare): ...to new function. (usage): Update synopsis. (main): Allow alternate calling form. * src/fdstream.c (virFDStreamOpenFileInternal): Use alternate form. --- src/fdstream.c | 32 ++++-------- src/util/iohelper.c | 142 +++++++++++++++++++++++++++++++++++--------------- 2 files changed, 110 insertions(+), 64 deletions(-) diff --git a/src/fdstream.c b/src/fdstream.c index dd742e1..754431b 100644 --- a/src/fdstream.c +++ b/src/fdstream.c @@ -535,6 +535,14 @@ virFDStreamOpenFileInternal(virStreamPtr st, goto error; } + if (offset && + lseek(fd, offset, SEEK_SET) != offset) { + virReportSystemError(errno, + _("Unable to seek %s to %llu"), + path, offset); + goto error; + } + /* Thanks to the POSIX i/o model, we can't reliably get * non-blocking I/O on block devs/regular files. To * support those we need to fork a helper process todo @@ -545,14 +553,13 @@ virFDStreamOpenFileInternal(virStreamPtr st, !S_ISFIFO(sb.st_mode))) { int childfd; - if ((oflags & O_RDWR) == O_RDWR) { + if ((oflags & O_ACCMODE) == O_RDWR) { streamsReportError(VIR_ERR_INTERNAL_ERROR, _("%s: Cannot request read and write flags together"), path); goto error; } - VIR_FORCE_CLOSE(fd); if (pipe(fds) < 0) { virReportSystemError(errno, "%s", _("Unable to create pipe")); @@ -562,18 +569,9 @@ virFDStreamOpenFileInternal(virStreamPtr st, cmd = virCommandNewArgList(LIBEXECDIR "/libvirt_iohelper", path, NULL); - virCommandAddArgFormat(cmd, "%d", oflags); - virCommandAddArgFormat(cmd, "%d", mode); - virCommandAddArgFormat(cmd, "%llu", offset); virCommandAddArgFormat(cmd, "%llu", length); - virCommandAddArgFormat(cmd, "%u", delete); - - /* when running iohelper we don't want to delete file now, - * because a race condition may occur in which we delete it - * before iohelper even opens it. We want iohelper to remove - * the file instead. - */ - delete = false; + virCommandTransferFD(cmd, fd); + virCommandAddArgFormat(cmd, "%d", fd); if (oflags == O_RDONLY) { childfd = fds[1]; @@ -590,14 +588,6 @@ virFDStreamOpenFileInternal(virStreamPtr st, goto error; VIR_FORCE_CLOSE(childfd); - } else { - if (offset && - lseek(fd, offset, SEEK_SET) != offset) { - virReportSystemError(errno, - _("Unable to seek %s to %llu"), - path, offset); - goto error; - } } if (virFDStreamOpenInternal(st, fd, cmd, errfd, length) < 0) diff --git a/src/util/iohelper.c b/src/util/iohelper.c index 0368eba..1185bde 100644 --- a/src/util/iohelper.c +++ b/src/util/iohelper.c @@ -42,19 +42,11 @@ #define VIR_FROM_THIS VIR_FROM_STORAGE -static int runIO(const char *path, - int oflags, - int mode, - unsigned long long offset, - unsigned long long length) +static int +prepare(const char *path, int oflags, int mode, + unsigned long long offset) { - char *buf = NULL; - size_t buflen = 1024*1024; - int fd; - int ret = -1; - int fdin, fdout; - const char *fdinname, *fdoutname; - unsigned long long total = 0; + int fd = -1; if (oflags & O_CREAT) { fd = open(path, oflags, mode); @@ -70,10 +62,25 @@ static int runIO(const char *path, if (lseek(fd, offset, SEEK_SET) < 0) { virReportSystemError(errno, _("Unable to seek %s to %llu"), path, offset); + VIR_FORCE_CLOSE(fd); goto cleanup; } } +cleanup: + return fd; +} + +static int +runIO(const char *path, int fd, int oflags, unsigned long long length) +{ + char *buf = NULL; + size_t buflen = 1024*1024; + int ret = -1; + int fdin, fdout; + const char *fdinname, *fdoutname; + unsigned long long total = 0; + if (VIR_ALLOC_N(buf, buflen) < 0) { virReportOOMError(); goto cleanup; @@ -138,61 +145,109 @@ cleanup: return ret; } -int main(int argc, char **argv) +static const char *program_name; + +ATTRIBUTE_NORETURN static void +usage(int status) +{ + if (status) { + fprintf(stderr, _("%s: try --help for more details"), program_name); + } else { + printf(_("Usage: %s FILENAME OFLAGS MODE OFFSET LENGTH DELETE\n" + " or: %s FILENAME LENGTH FD\n"), + program_name, program_name); + } + exit(status); +} + +int +main(int argc, char **argv) { const char *path; virErrorPtr err; unsigned long long offset; unsigned long long length; - int oflags; + int oflags = -1; int mode; - unsigned int delete; + unsigned int delete = 0; + int fd = -1; + int lengthIndex = 0; + + program_name = argv[0]; if (setlocale(LC_ALL, "") == NULL || bindtextdomain(PACKAGE, LOCALEDIR) == NULL || textdomain(PACKAGE) == NULL) { - fprintf(stderr, _("%s: initialization failed\n"), argv[0]); + fprintf(stderr, _("%s: initialization failed\n"), program_name); exit(EXIT_FAILURE); } if (virThreadInitialize() < 0 || virErrorInitialize() < 0 || virRandomInitialize(time(NULL) ^ getpid())) { - fprintf(stderr, _("%s: initialization failed\n"), argv[0]); - exit(EXIT_FAILURE); - } - - if (argc != 7) { - fprintf(stderr, _("%s: syntax FILENAME FLAGS MODE OFFSET LENGTH DELETE\n"), argv[0]); + fprintf(stderr, _("%s: initialization failed\n"), program_name); exit(EXIT_FAILURE); } path = argv[1]; - if (virStrToLong_i(argv[2], NULL, 10, &oflags) < 0) { - fprintf(stderr, _("%s: malformed file flags %s"), argv[0], argv[2]); - exit(EXIT_FAILURE); - } - - if (virStrToLong_i(argv[3], NULL, 10, &mode) < 0) { - fprintf(stderr, _("%s: malformed file mode %s"), argv[0], argv[3]); - exit(EXIT_FAILURE); + if (argc > 1 && STREQ(argv[1], "--help")) + usage(EXIT_SUCCESS); + if (argc == 7) { /* FILENAME OFLAGS MODE OFFSET LENGTH DELETE */ + lengthIndex = 5; + if (virStrToLong_i(argv[2], NULL, 10, &oflags) < 0) { + fprintf(stderr, _("%s: malformed file flags %s"), + program_name, argv[2]); + exit(EXIT_FAILURE); + } + if (virStrToLong_i(argv[3], NULL, 10, &mode) < 0) { + fprintf(stderr, _("%s: malformed file mode %s"), + program_name, argv[3]); + exit(EXIT_FAILURE); + } + if (virStrToLong_ull(argv[4], NULL, 10, &offset) < 0) { + fprintf(stderr, _("%s: malformed file offset %s"), + program_name, argv[4]); + exit(EXIT_FAILURE); + } + if (argc == 7 && virStrToLong_ui(argv[6], NULL, 10, &delete) < 0) { + fprintf(stderr, _("%s: malformed delete flag %s"), + program_name, argv[6]); + exit(EXIT_FAILURE); + } + fd = prepare(path, oflags, mode, offset); + } else if (argc == 4) { /* FILENAME LENGTH FD */ + lengthIndex = 2; + if (virStrToLong_i(argv[3], NULL, 10, &fd) < 0) { + fprintf(stderr, _("%s: malformed fd %s"), + program_name, argv[3]); + exit(EXIT_FAILURE); + } +#ifdef F_GETFL + oflags = fcntl(fd, F_GETFL); +#else + /* Stupid mingw. */ + if (fd == STDIN_FILENO) + oflags = O_RDONLY; + else if (fd == STDOUT_FILENO) + oflags = O_WRONLY; +#endif + if (oflags < 0) { + fprintf(stderr, _("%s: unable to determine access mode of fd %d"), + program_name, fd); + exit(EXIT_FAILURE); + } + } else { /* unknown argc pattern */ + usage(EXIT_FAILURE); } - if (virStrToLong_ull(argv[4], NULL, 10, &offset) < 0) { - fprintf(stderr, _("%s: malformed file offset %s"), argv[0], argv[4]); - exit(EXIT_FAILURE); - } - if (virStrToLong_ull(argv[5], NULL, 10, &length) < 0) { - fprintf(stderr, _("%s: malformed file length %s"), argv[0], argv[5]); - exit(EXIT_FAILURE); - } - if (virStrToLong_ui(argv[6], NULL, 10, &delete) < 0) { - fprintf(stderr, _("%s: malformed delete flag %s"), argv[0],argv[6]); + if (virStrToLong_ull(argv[lengthIndex], NULL, 10, &length) < 0) { + fprintf(stderr, _("%s: malformed file length %s"), + program_name, argv[lengthIndex]); exit(EXIT_FAILURE); } - if (runIO(path, oflags, mode, offset, length) < 0) + if (fd < 0 || runIO(path, fd, oflags, length) < 0) goto error; if (delete) @@ -203,9 +258,10 @@ int main(int argc, char **argv) error: err = virGetLastError(); if (err) { - fprintf(stderr, "%s: %s\n", argv[0], err->message); + fprintf(stderr, "%s: %s\n", program_name, err->message); } else { - fprintf(stderr, _("%s: unknown failure with %s\n"), argv[0], path); + fprintf(stderr, _("%s: unknown failure with %s\n"), + program_name, path); } exit(EXIT_FAILURE); } -- 1.7.4.4

On 07/14/2011 08:24 AM, Eric Blake wrote:
Rather than making the iohelper subject to a race in reopening the file, it is nicer to pass an already-open fd by inheritance.
The old synopsis form must continue to work - if someone updates their libvirt package and installs a new libvirt_iohelper but without restarting the old libvirtd daemon, then the daemon can still make calls using the old syntax but the new iohelper.
* src/util/iohelper.c (runIO): Split code for open... (prepare): ...to new function. (usage): Update synopsis. (main): Allow alternate calling form. * src/fdstream.c (virFDStreamOpenFileInternal): Use alternate form. --- src/fdstream.c | 32 ++++-------- src/util/iohelper.c | 142 +++++++++++++++++++++++++++++++++++--------------- 2 files changed, 110 insertions(+), 64 deletions(-)
Squash this in if my v2 command patch goes in first: https://www.redhat.com/archives/libvir-list/2011-July/msg00864.html diff --git i/src/fdstream.c w/src/fdstream.c index 754431b..b70c831 100644 --- i/src/fdstream.c +++ w/src/fdstream.c @@ -570,7 +570,7 @@ virFDStreamOpenFileInternal(virStreamPtr st, path, NULL); virCommandAddArgFormat(cmd, "%llu", length); - virCommandTransferFD(cmd, fd); + virCommandTransferFD(cmd, &fd); virCommandAddArgFormat(cmd, "%d", fd); if (oflags == O_RDONLY) { -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On Thu, Jul 14, 2011 at 08:24:32AM -0600, Eric Blake wrote:
Rather than making the iohelper subject to a race in reopening the file, it is nicer to pass an already-open fd by inheritance.
The old synopsis form must continue to work - if someone updates their libvirt package and installs a new libvirt_iohelper but without restarting the old libvirtd daemon, then the daemon can still make calls using the old syntax but the new iohelper.
* src/util/iohelper.c (runIO): Split code for open... (prepare): ...to new function. (usage): Update synopsis. (main): Allow alternate calling form. * src/fdstream.c (virFDStreamOpenFileInternal): Use alternate form. --- src/fdstream.c | 32 ++++-------- src/util/iohelper.c | 142 +++++++++++++++++++++++++++++++++++--------------- 2 files changed, 110 insertions(+), 64 deletions(-)
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Required for a coming patch where iohelper will operate on O_DIRECT fds. There, the user-space memory must be aligned to file system boundaries (at least page-aligned works best, and some file systems prefer 64k). Made tougher by the fact that VIR_ALLOC won't work on void *, but posix_memalign won't work on char * and isn't available everywhere. This patch makes some simplifying assumptions - namely, output to an O_DIRECT fd will only be attempted on an empty seekable file (hence, no need to worry about preserving existing data on a partial block, and ftruncate will work to undo the effects of having to round up the size of the last block written). * configure.ac (AC_CHECK_FUNCS_ONCE): Check for posix_memalign. * src/util/iohelper.c (runIO): Use aligned memory, and handle quirks of O_DIRECT on last write. --- This one took me the longest time to get right, so a careful review is appreciated. configure.ac | 6 +++--- src/util/iohelper.c | 45 ++++++++++++++++++++++++++++++++++++++++++--- 2 files changed, 45 insertions(+), 6 deletions(-) diff --git a/configure.ac b/configure.ac index e9d5be4..9e39f44 100644 --- a/configure.ac +++ b/configure.ac @@ -121,9 +121,9 @@ AC_CHECK_SIZEOF([long]) dnl Availability of various common functions (non-fatal if missing), dnl and various less common threadsafe functions -AC_CHECK_FUNCS_ONCE([cfmakeraw regexec sched_getaffinity getuid getgid \ - geteuid initgroups posix_fallocate mmap kill \ - getmntent_r getgrnam_r getpwuid_r]) +AC_CHECK_FUNCS_ONCE([cfmakeraw geteuid getgid getgrnam_r getmntent_r \ + getpwuid_r getuid initgroups kill mmap posix_fallocate posix_memalign \ + regexec sched_getaffinity]) dnl Availability of pthread functions (if missing, win32 threading is dnl assumed). Because of $LIB_PTHREAD, we cannot use AC_CHECK_FUNCS_ONCE. diff --git a/src/util/iohelper.c b/src/util/iohelper.c index 1185bde..82f62e1 100644 --- a/src/util/iohelper.c +++ b/src/util/iohelper.c @@ -74,17 +74,32 @@ cleanup: static int runIO(const char *path, int fd, int oflags, unsigned long long length) { - char *buf = NULL; + void *base = NULL; /* Location to be freed */ + char *buf = NULL; /* Aligned location within base */ size_t buflen = 1024*1024; + intptr_t alignMask = 64*1024 - 1; int ret = -1; int fdin, fdout; const char *fdinname, *fdoutname; unsigned long long total = 0; + bool direct = O_DIRECT && ((oflags & O_DIRECT) != 0); + bool shortRead = false; /* true if we hit a short read */ + off_t end = 0; - if (VIR_ALLOC_N(buf, buflen) < 0) { +#if HAVE_POSIX_MEMALIGN + if (posix_memalign(&base, alignMask + 1, buflen)) { virReportOOMError(); goto cleanup; } + buf = base; +#else + if (VIR_ALLOC_N(buf, buflen + alignMask) < 0) { + virReportOOMError(); + goto cleanup; + } + base = buf; + buf = (char *) (((intptr_t) base + alignMask) & alignMask); +#endif switch (oflags & O_ACCMODE) { case O_RDONLY: @@ -98,6 +113,13 @@ runIO(const char *path, int fd, int oflags, unsigned long long length) fdinname = "stdin"; fdout = fd; fdoutname = path; + /* To make the implementation simpler, we give up on any + * attempt to use O_DIRECT in a non-trivial manner. */ + if (direct && (end = lseek(fd, 0, SEEK_END)) != 0) { + virReportSystemError(end < 0 ? errno : EINVAL, "%s", + _("O_DIRECT needs empty seekable file")); + goto cleanup; + } break; case O_RDWR: @@ -124,12 +146,29 @@ runIO(const char *path, int fd, int oflags, unsigned long long length) } if (got == 0) break; /* End of file before end of requested data */ + if (got < buflen || (buflen & alignMask)) { + /* O_DIRECT can handle at most one short read, at end of file */ + if (direct && shortRead) { + virReportSystemError(EINVAL, "%s", + _("Too many short reads for O_DIRECT")); + } + shortRead = true; + } total += got; + if (fdout == fd && direct && shortRead) { + end = total; + memset(buf + got, 0, buflen - got); + got = (got + alignMask) & ~alignMask; + } if (safewrite(fdout, buf, got) < 0) { virReportSystemError(errno, _("Unable to write %s"), fdoutname); goto cleanup; } + if (end && ftruncate(fd, end) < 0) { + virReportSystemError(errno, _("Unable to truncate %s"), fdoutname); + goto cleanup; + } } ret = 0; @@ -141,7 +180,7 @@ cleanup: ret = -1; } - VIR_FREE(buf); + VIR_FREE(base); return ret; } -- 1.7.4.4

On Thu, Jul 14, 2011 at 08:24:33AM -0600, Eric Blake wrote:
Required for a coming patch where iohelper will operate on O_DIRECT fds. There, the user-space memory must be aligned to file system boundaries (at least page-aligned works best, and some file systems prefer 64k). Made tougher by the fact that VIR_ALLOC won't work on void *, but posix_memalign won't work on char * and isn't available everywhere.
This patch makes some simplifying assumptions - namely, output to an O_DIRECT fd will only be attempted on an empty seekable file (hence, no need to worry about preserving existing data on a partial block, and ftruncate will work to undo the effects of having to round up the size of the last block written).
* configure.ac (AC_CHECK_FUNCS_ONCE): Check for posix_memalign. * src/util/iohelper.c (runIO): Use aligned memory, and handle quirks of O_DIRECT on last write. ---
This one took me the longest time to get right, so a careful review is appreciated.
configure.ac | 6 +++--- src/util/iohelper.c | 45 ++++++++++++++++++++++++++++++++++++++++++--- 2 files changed, 45 insertions(+), 6 deletions(-)
diff --git a/configure.ac b/configure.ac index e9d5be4..9e39f44 100644 --- a/configure.ac +++ b/configure.ac @@ -121,9 +121,9 @@ AC_CHECK_SIZEOF([long])
dnl Availability of various common functions (non-fatal if missing), dnl and various less common threadsafe functions -AC_CHECK_FUNCS_ONCE([cfmakeraw regexec sched_getaffinity getuid getgid \ - geteuid initgroups posix_fallocate mmap kill \ - getmntent_r getgrnam_r getpwuid_r]) +AC_CHECK_FUNCS_ONCE([cfmakeraw geteuid getgid getgrnam_r getmntent_r \ + getpwuid_r getuid initgroups kill mmap posix_fallocate posix_memalign \ + regexec sched_getaffinity])
dnl Availability of pthread functions (if missing, win32 threading is dnl assumed). Because of $LIB_PTHREAD, we cannot use AC_CHECK_FUNCS_ONCE. diff --git a/src/util/iohelper.c b/src/util/iohelper.c index 1185bde..82f62e1 100644 --- a/src/util/iohelper.c +++ b/src/util/iohelper.c @@ -74,17 +74,32 @@ cleanup: static int runIO(const char *path, int fd, int oflags, unsigned long long length) { - char *buf = NULL; + void *base = NULL; /* Location to be freed */ + char *buf = NULL; /* Aligned location within base */ size_t buflen = 1024*1024; + intptr_t alignMask = 64*1024 - 1; int ret = -1; int fdin, fdout; const char *fdinname, *fdoutname; unsigned long long total = 0; + bool direct = O_DIRECT && ((oflags & O_DIRECT) != 0); + bool shortRead = false; /* true if we hit a short read */ + off_t end = 0;
- if (VIR_ALLOC_N(buf, buflen) < 0) { +#if HAVE_POSIX_MEMALIGN + if (posix_memalign(&base, alignMask + 1, buflen)) { virReportOOMError(); goto cleanup; } + buf = base; +#else + if (VIR_ALLOC_N(buf, buflen + alignMask) < 0) { + virReportOOMError(); + goto cleanup; + } + base = buf; + buf = (char *) (((intptr_t) base + alignMask) & alignMask); +#endif
switch (oflags & O_ACCMODE) { case O_RDONLY: @@ -98,6 +113,13 @@ runIO(const char *path, int fd, int oflags, unsigned long long length) fdinname = "stdin"; fdout = fd; fdoutname = path; + /* To make the implementation simpler, we give up on any + * attempt to use O_DIRECT in a non-trivial manner. */ + if (direct && (end = lseek(fd, 0, SEEK_END)) != 0) { + virReportSystemError(end < 0 ? errno : EINVAL, "%s", + _("O_DIRECT needs empty seekable file")); + goto cleanup; + } break;
case O_RDWR: @@ -124,12 +146,29 @@ runIO(const char *path, int fd, int oflags, unsigned long long length) } if (got == 0) break; /* End of file before end of requested data */ + if (got < buflen || (buflen & alignMask)) { + /* O_DIRECT can handle at most one short read, at end of file */ + if (direct && shortRead) { + virReportSystemError(EINVAL, "%s", + _("Too many short reads for O_DIRECT")); + } + shortRead = true; + }
total += got; + if (fdout == fd && direct && shortRead) { + end = total; + memset(buf + got, 0, buflen - got); + got = (got + alignMask) & ~alignMask; + } if (safewrite(fdout, buf, got) < 0) { virReportSystemError(errno, _("Unable to write %s"), fdoutname); goto cleanup; } + if (end && ftruncate(fd, end) < 0) { + virReportSystemError(errno, _("Unable to truncate %s"), fdoutname); + goto cleanup; + } }
ret = 0; @@ -141,7 +180,7 @@ cleanup: ret = -1; }
- VIR_FREE(buf); + VIR_FREE(base); return ret; }
ACK, I'm wondering if we should move the guts of iohelper out into a function in src/util which we can unit test. Or perhaps its easy enough to just unit test the fdstream code and thus iohelper Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

O_DIRECT has stringent requirements - I/O must occur with buffers that have both alignment and size as multiples of the file system block size (used to be 512 bytes, but these days, 4k is safer, and 64k allows for better throughput). Rather than make lots of changes at each site that wants to use O_DIRECT, it is easier to offload the work through a helper process that mirrors the I/O between a pipe and the actual direct fd, so that the other end of the pipe no longer has to worry about constraints. * src/util/virdirect.h: New file. * src/util/virdirect.c: Likewise. * src/Makefile.am (UTIL_SOURCES): Build them. * src/libvirt_private.syms: Export new symbols. * cfg.mk (useless_free_options): Add to list. * po/POTFILES.in: Translate new file. --- cfg.mk | 1 + po/POTFILES.in | 1 + src/Makefile.am | 1 + src/libvirt_private.syms | 6 ++ src/util/virdirect.c | 149 ++++++++++++++++++++++++++++++++++++++++++++++ src/util/virdirect.h | 37 +++++++++++ 6 files changed, 195 insertions(+), 0 deletions(-) create mode 100644 src/util/virdirect.c create mode 100644 src/util/virdirect.h diff --git a/cfg.mk b/cfg.mk index 2873177..69d2b6a 100644 --- a/cfg.mk +++ b/cfg.mk @@ -97,6 +97,7 @@ useless_free_options = \ --name=virCommandFree \ --name=virConfFreeList \ --name=virConfFreeValue \ + --name=virDirectFdFree \ --name=virDomainChrDefFree \ --name=virDomainChrSourceDefFree \ --name=virDomainControllerDefFree \ diff --git a/po/POTFILES.in b/po/POTFILES.in index 5782cbf..ad031f3 100644 --- a/po/POTFILES.in +++ b/po/POTFILES.in @@ -122,6 +122,7 @@ src/util/storage_file.c src/util/sysinfo.c src/util/util.c src/util/viraudit.c +src/util/virdirect.c src/util/virterror.c src/util/xml.c src/vbox/vbox_MSCOMGlue.c diff --git a/src/Makefile.am b/src/Makefile.am index d19d1ca..ce041b4 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -80,6 +80,7 @@ UTIL_SOURCES = \ util/uuid.c util/uuid.h \ util/util.c util/util.h \ util/viraudit.c util/viraudit.h \ + util/virdirect.c util/virdirect.h \ util/xml.c util/xml.h \ util/virterror.c util/virterror_internal.h diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index f95d341..c78485d 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1085,6 +1085,12 @@ virAuditOpen; virAuditSend; +# virdirect.h +virDirectFdClose; +virDirectFdFree; +virDirectFdNew; + + # virterror_internal.h virDispatchError; virErrorMsg; diff --git a/src/util/virdirect.c b/src/util/virdirect.c new file mode 100644 index 0000000..a3dca77 --- /dev/null +++ b/src/util/virdirect.c @@ -0,0 +1,149 @@ +/* + * direct.c: management of O_DIRECT fds + * + * Copyright (C) 2011 Red Hat, Inc. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ + +#include <config.h> +#include "internal.h" + +#include "virdirect.h" + +#include <fcntl.h> +#include <sys/stat.h> + +#include "command.h" +#include "configmake.h" +#include "files.h" +#include "memory.h" +#include "virterror_internal.h" + +#define VIR_FROM_THIS VIR_FROM_NONE +#define virDirectError(code, ...) \ + virReportErrorHelper(VIR_FROM_THIS, code, __FILE__, \ + __FUNCTION__, __LINE__, __VA_ARGS__) + +/* Opaque type for managing a wrapper around an O_DIRECT fd. For now, + * read-write is not supported, just a single direction. */ +struct _virDirectFd { + virCommandPtr cmd; /* Child iohelper process to do the I/O. */ +}; + +/* Replace *FD (open to an O_DIRECT file or device) with a new (pipe) + * fd that will pass all I/O through a child process, in order to obey + * O_DIRECT restrictions when doing I/O to the original fd. NAME is + * for diagnostics only. On success, the new wrapper object is + * returned, the original fd is no longer open in the parent process, + * and the caller must close the new *FD descriptor. On failure, *FD + * is unchanged, an error message is output, and NULL is returned. */ +virDirectFdPtr +virDirectFdNew(int *fd, const char *name) +{ + virDirectFdPtr ret = NULL; + bool output = false; + int pipefd[2] = { -1, -1 }; + int mode = -1; + + if (VIR_ALLOC(ret) < 0) { + virReportOOMError(); + goto error; + } + if (!O_DIRECT) + return ret; + +#ifdef F_GETFL + /* Mingw lacks F_GETFL, but it also lacks O_DIRECT, in which case + * we don't need a child process in the first place. */ + mode = fcntl(*fd, F_GETFL); +#endif + + if (mode < 0) { + virDirectError(VIR_ERR_INTERNAL_ERROR, _("invalid fd %d for %s"), + *fd, name); + goto error; + } else if ((mode & O_ACCMODE) == O_WRONLY) { + output = true; + } else if ((mode & O_ACCMODE) != O_RDONLY) { + virDirectError(VIR_ERR_INTERNAL_ERROR, _("unexpected mode %x for %s"), + mode & O_ACCMODE, name); + goto error; + } + + if (pipe2(pipefd, O_CLOEXEC) < 0) { + virDirectError(VIR_ERR_INTERNAL_ERROR, + _("unable to create pipe for %s"), name); + goto error; + } + + ret->cmd = virCommandNewArgList(LIBEXECDIR "/libvirt_iohelper", + name, "0", NULL); + if (output) { + virCommandSetInputFD(ret->cmd, pipefd[0]); + virCommandSetOutputFD(ret->cmd, fd); + virCommandAddArg(ret->cmd, "1"); + } else { + virCommandSetInputFD(ret->cmd, *fd); + virCommandSetOutputFD(ret->cmd, &pipefd[1]); + virCommandAddArg(ret->cmd, "0"); + } + + if (virCommandRunAsync(ret->cmd, NULL) < 0) + goto error; + + if (VIR_CLOSE(pipefd[!output]) < 0) { + virDirectError(VIR_ERR_INTERNAL_ERROR, "%s", + _("unable to close pipe")); + goto error; + } + + VIR_FORCE_CLOSE(*fd); + *fd = pipefd[output]; + return ret; + +error: + VIR_FORCE_CLOSE(pipefd[0]); + VIR_FORCE_CLOSE(pipefd[1]); + virDirectFdFree(ret); + return NULL; +} + +/* If DFD is valid, then reap the child process. Return 0 if the + * child process was successfully reaped, or -1 on failure with an + * error emitted. This function intentionally returns 0 when DFD is + * not valid, so that callers can conditionally create a virDirectFd + * wrapper but unconditionally call the cleanup code. */ +int +virDirectFdClose(virDirectFdPtr dfd) +{ + if (!dfd || !O_DIRECT) + return 0; + + return virCommandWait(dfd->cmd, NULL); +} + +/* Free all resources associated with DFD. If virDirectFdClose was + * not previously called, then this forcefully terminates the child + * process. */ +void +virDirectFdFree(virDirectFdPtr dfd) +{ + if (!dfd) + return; + + virCommandFree(dfd->cmd); + VIR_FREE(dfd); +} diff --git a/src/util/virdirect.h b/src/util/virdirect.h new file mode 100644 index 0000000..35f5f6b --- /dev/null +++ b/src/util/virdirect.h @@ -0,0 +1,37 @@ +/* + * direct.h: management of O_DIRECT fds + * + * Copyright (C) 2011 Red Hat, Inc. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ + +#ifndef __VIR_DIRECT_H__ +# define __VIR_DIRECT_H__ + +/* Opaque type for managing a wrapper around an O_DIRECT fd. */ +struct _virDirectFd; + +typedef struct _virDirectFd virDirectFd; +typedef virDirectFd *virDirectFdPtr; + +virDirectFdPtr virDirectFdNew(int *fd, const char *name) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_RETURN_CHECK; + +int virDirectFdClose(virDirectFdPtr dfd); + +void virDirectFdFree(virDirectFdPtr dfd); + +#endif /* __VIR_DIRECT_H__ */ -- 1.7.4.4

On 07/14/2011 08:24 AM, Eric Blake wrote:
O_DIRECT has stringent requirements - I/O must occur with buffers that have both alignment and size as multiples of the file system block size (used to be 512 bytes, but these days, 4k is safer, and 64k allows for better throughput). Rather than make lots of changes at each site that wants to use O_DIRECT, it is easier to offload the work through a helper process that mirrors the I/O between a pipe and the actual direct fd, so that the other end of the pipe no longer has to worry about constraints.
+virDirectFdPtr +virDirectFdNew(int *fd, const char *name) +{ + virDirectFdPtr ret = NULL; + bool output = false; + int pipefd[2] = { -1, -1 }; + int mode = -1; + + if (VIR_ALLOC(ret) < 0) { + virReportOOMError(); + goto error; + } + if (!O_DIRECT) + return ret;
Question - should an attempt to use 'virsh save --direct' on a system that lacks O_DIRECT (think mingw) be rejected, rather than silently ignored? My argument is that --direct is merely an optimization hint - it tries to reduce filesystem cache pollution (possibly at the expense of slower operation), but other than the cache effects, the end result is the same as if O_DIRECT is unavailable. Hence, my decision of silently ignoring it rather than erroring out. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On Thu, Jul 14, 2011 at 08:34:46AM -0600, Eric Blake wrote:
On 07/14/2011 08:24 AM, Eric Blake wrote:
O_DIRECT has stringent requirements - I/O must occur with buffers that have both alignment and size as multiples of the file system block size (used to be 512 bytes, but these days, 4k is safer, and 64k allows for better throughput). Rather than make lots of changes at each site that wants to use O_DIRECT, it is easier to offload the work through a helper process that mirrors the I/O between a pipe and the actual direct fd, so that the other end of the pipe no longer has to worry about constraints.
+virDirectFdPtr +virDirectFdNew(int *fd, const char *name) +{ + virDirectFdPtr ret = NULL; + bool output = false; + int pipefd[2] = { -1, -1 }; + int mode = -1; + + if (VIR_ALLOC(ret) < 0) { + virReportOOMError(); + goto error; + } + if (!O_DIRECT) + return ret;
Question - should an attempt to use 'virsh save --direct' on a system that lacks O_DIRECT (think mingw) be rejected, rather than silently ignored? My argument is that --direct is merely an optimization hint - it tries to reduce filesystem cache pollution (possibly at the expense of slower operation), but other than the cache effects, the end result is the same as if O_DIRECT is unavailable. Hence, my decision of silently ignoring it rather than erroring out.
The motivation for using O_DIRECT is that allowing pollution of the host cache causes stability problems for the host as a whole. As such IMHO, apps would likely want an error back if O_DIRECT cannot be supported, NB, even some Linux filesystems can't do O_DIRECT, so this isn't an obscure mingw32 issue. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 07/19/2011 10:06 AM, Daniel P. Berrange wrote:
The motivation for using O_DIRECT is that allowing pollution of the host cache causes stability problems for the host as a whole. As such IMHO, apps would likely want an error back if O_DIRECT cannot be supported,
NB, even some Linux filesystems can't do O_DIRECT, so this isn't an obscure mingw32 issue.
Conversely, open() on Linux silently ignores unknown flags - so if you are using a really old kernel but newer glibc headers, then O_DIRECT is non-zero and open() succeeds, but you _don't_ get direct I/O. If O_DIRECT is 0, then it is pretty easy to diagnose that the request is unsupported. But if O_DIRECT is non-zero, then how do I tell whether the open(O_DIRECT) really meant that I have direct I/O, or whether it was a nice hint but still ignored and I'm still polluting the file system cache? -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On Tue, Jul 19, 2011 at 10:08:21AM -0600, Eric Blake wrote:
On 07/19/2011 10:06 AM, Daniel P. Berrange wrote:
The motivation for using O_DIRECT is that allowing pollution of the host cache causes stability problems for the host as a whole. As such IMHO, apps would likely want an error back if O_DIRECT cannot be supported,
NB, even some Linux filesystems can't do O_DIRECT, so this isn't an obscure mingw32 issue.
Conversely, open() on Linux silently ignores unknown flags - so if you are using a really old kernel but newer glibc headers, then O_DIRECT is non-zero and open() succeeds, but you _don't_ get direct I/O.
If O_DIRECT is 0, then it is pretty easy to diagnose that the request is unsupported. But if O_DIRECT is non-zero, then how do I tell whether the open(O_DIRECT) really meant that I have direct I/O, or whether it was a nice hint but still ignored and I'm still polluting the file system cache?
Hmm, I could have sworn we've seen QEMU itself fail to start when requesting O_DIRECT on say, tmpfs. Perhaps open() isn't failing, but rather read/write fail ? Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Thu, Jul 14, 2011 at 08:24:34AM -0600, Eric Blake wrote:
O_DIRECT has stringent requirements - I/O must occur with buffers that have both alignment and size as multiples of the file system block size (used to be 512 bytes, but these days, 4k is safer, and 64k allows for better throughput). Rather than make lots of changes at each site that wants to use O_DIRECT, it is easier to offload the work through a helper process that mirrors the I/O between a pipe and the actual direct fd, so that the other end of the pipe no longer has to worry about constraints.
* src/util/virdirect.h: New file. * src/util/virdirect.c: Likewise. * src/Makefile.am (UTIL_SOURCES): Build them. * src/libvirt_private.syms: Export new symbols. * cfg.mk (useless_free_options): Add to list. * po/POTFILES.in: Translate new file. --- cfg.mk | 1 + po/POTFILES.in | 1 + src/Makefile.am | 1 + src/libvirt_private.syms | 6 ++ src/util/virdirect.c | 149 ++++++++++++++++++++++++++++++++++++++++++++++ src/util/virdirect.h | 37 +++++++++++ 6 files changed, 195 insertions(+), 0 deletions(-) create mode 100644 src/util/virdirect.c create mode 100644 src/util/virdirect.h
diff --git a/cfg.mk b/cfg.mk index 2873177..69d2b6a 100644 --- a/cfg.mk +++ b/cfg.mk @@ -97,6 +97,7 @@ useless_free_options = \ --name=virCommandFree \ --name=virConfFreeList \ --name=virConfFreeValue \ + --name=virDirectFdFree \ --name=virDomainChrDefFree \ --name=virDomainChrSourceDefFree \ --name=virDomainControllerDefFree \ diff --git a/po/POTFILES.in b/po/POTFILES.in index 5782cbf..ad031f3 100644 --- a/po/POTFILES.in +++ b/po/POTFILES.in @@ -122,6 +122,7 @@ src/util/storage_file.c src/util/sysinfo.c src/util/util.c src/util/viraudit.c +src/util/virdirect.c src/util/virterror.c src/util/xml.c src/vbox/vbox_MSCOMGlue.c diff --git a/src/Makefile.am b/src/Makefile.am index d19d1ca..ce041b4 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -80,6 +80,7 @@ UTIL_SOURCES = \ util/uuid.c util/uuid.h \ util/util.c util/util.h \ util/viraudit.c util/viraudit.h \ + util/virdirect.c util/virdirect.h \ util/xml.c util/xml.h \ util/virterror.c util/virterror_internal.h
diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index f95d341..c78485d 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1085,6 +1085,12 @@ virAuditOpen; virAuditSend;
+# virdirect.h +virDirectFdClose; +virDirectFdFree; +virDirectFdNew;
The principle all seems fine, but I'm wondering about the naming of this. Also perhaps we could put this in the files.h module ? It would be nice to eventually move all virFile named APIs into there and perhaps even rename it to virfile.{c,h} Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 07/19/2011 11:45 AM, Daniel P. Berrange wrote:
On Thu, Jul 14, 2011 at 08:24:34AM -0600, Eric Blake wrote:
O_DIRECT has stringent requirements - I/O must occur with buffers that have both alignment and size as multiples of the file system block size (used to be 512 bytes, but these days, 4k is safer, and 64k allows for better throughput). Rather than make lots of changes at each site that wants to use O_DIRECT, it is easier to offload the work through a helper process that mirrors the I/O between a pipe and the actual direct fd, so that the other end of the pipe no longer has to worry about constraints.
+++ b/src/libvirt_private.syms @@ -1085,6 +1085,12 @@ virAuditOpen; virAuditSend;
+# virdirect.h +virDirectFdClose; +virDirectFdFree; +virDirectFdNew;
The principle all seems fine, but I'm wondering about the naming of this. Also perhaps we could put this in the files.h module ? It would be nice to eventually move all virFile named APIs into there and perhaps even rename it to virfile.{c,h}
Sure, will do that on my v2 respin. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

Wire together the previous patches to support O_DIRECT via API request in qemu. * src/qemu/qemu_driver.c (qemuDomainSaveInternal, doCoreDump): Add parameter. (qemuDomainSaveFlags, qemuDomainManagedSave, qemudDomainCoreDump) (processWatchdogEvent): Update callers. --- src/qemu/qemu_driver.c | 53 ++++++++++++++++++++++++++++++++--------------- 1 files changed, 36 insertions(+), 17 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index ceeef1c..7687e18 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -87,6 +87,7 @@ #include "configmake.h" #include "threadpool.h" #include "locking/lock_manager.h" +#include "virdirect.h" #define VIR_FROM_THIS VIR_FROM_QEMU @@ -2110,7 +2111,7 @@ qemuCompressProgramName(int compress) static int qemuDomainSaveInternal(struct qemud_driver *driver, virDomainPtr dom, virDomainObjPtr vm, const char *path, - int compressed) + int compressed, bool direct) { char *xml = NULL; struct qemud_save_header header; @@ -2125,6 +2126,7 @@ qemuDomainSaveInternal(struct qemud_driver *driver, virDomainPtr dom, int fd = -1; uid_t uid = getuid(); gid_t gid = getgid(); + virDirectFdPtr directFd = NULL; memset(&header, 0, sizeof(header)); memcpy(header.magic, QEMUD_SAVE_MAGIC, sizeof(header.magic)); @@ -2208,14 +2210,14 @@ qemuDomainSaveInternal(struct qemud_driver *driver, virDomainPtr dom, /* First try creating the file as root */ if (!is_reg) { - fd = open(path, O_WRONLY | O_TRUNC); + fd = open(path, O_WRONLY | O_TRUNC | (direct ? O_DIRECT : 0)); if (fd < 0) { virReportSystemError(errno, _("unable to open %s"), path); goto endjob; } } else { - if ((fd = virFileOpenAs(path, O_CREAT|O_TRUNC|O_WRONLY, - S_IRUSR|S_IWUSR, + int oflags = O_CREAT | O_TRUNC | O_WRONLY | (direct ? O_DIRECT : 0); + if ((fd = virFileOpenAs(path, oflags, S_IRUSR | S_IWUSR, uid, gid, 0)) < 0) { /* If we failed as root, and the error was permission-denied (EACCES or EPERM), assume it's on a network-connected share @@ -2261,7 +2263,7 @@ qemuDomainSaveInternal(struct qemud_driver *driver, virDomainPtr dom, /* Retry creating the file as driver->user */ - if ((fd = virFileOpenAs(path, O_CREAT|O_TRUNC|O_WRONLY, + if ((fd = virFileOpenAs(path, oflags, S_IRUSR|S_IWUSR|S_IRGRP|S_IWGRP, driver->user, driver->group, VIR_FILE_OPEN_AS_UID)) < 0) { @@ -2279,6 +2281,9 @@ qemuDomainSaveInternal(struct qemud_driver *driver, virDomainPtr dom, } } + if (direct && (directFd = virDirectFdNew(&fd, path)) == NULL) + goto endjob; + /* Write header to file, followed by XML */ if (qemuDomainSaveHeader(fd, path, xml, &header) < 0) { VIR_FORCE_CLOSE(fd); @@ -2294,6 +2299,8 @@ qemuDomainSaveInternal(struct qemud_driver *driver, virDomainPtr dom, virReportSystemError(errno, _("unable to close %s"), path); goto endjob; } + if (virDirectFdClose(directFd) < 0) + goto endjob; ret = 0; @@ -2326,6 +2333,7 @@ endjob: cleanup: VIR_FORCE_CLOSE(fd); + virDirectFdFree(directFd); VIR_FREE(xml); if (ret != 0 && is_reg) unlink(path); @@ -2361,7 +2369,7 @@ qemuDomainSaveFlags(virDomainPtr dom, const char *path, const char *dxml, int ret = -1; virDomainObjPtr vm = NULL; - virCheckFlags(0, -1); + virCheckFlags(VIR_DOMAIN_SAVE_DIRECT, -1); if (dxml) { qemuReportError(VIR_ERR_INVALID_ARG, "%s", _("xml modification unsupported")); @@ -2403,7 +2411,8 @@ qemuDomainSaveFlags(virDomainPtr dom, const char *path, const char *dxml, goto cleanup; } - ret = qemuDomainSaveInternal(driver, dom, vm, path, compressed); + ret = qemuDomainSaveInternal(driver, dom, vm, path, compressed, + (flags & VIR_DOMAIN_SAVE_DIRECT) != 0); vm = NULL; cleanup: @@ -2441,7 +2450,7 @@ qemuDomainManagedSave(virDomainPtr dom, unsigned int flags) int ret = -1; int compressed; - virCheckFlags(0, -1); + virCheckFlags(VIR_DOMAIN_SAVE_DIRECT, -1); qemuDriverLock(driver); vm = virDomainFindByUUID(&driver->domains, dom->uuid); @@ -2466,7 +2475,8 @@ qemuDomainManagedSave(virDomainPtr dom, unsigned int flags) VIR_INFO("Saving state to %s", name); compressed = QEMUD_SAVE_FORMAT_RAW; - ret = qemuDomainSaveInternal(driver, dom, vm, name, compressed); + ret = qemuDomainSaveInternal(driver, dom, vm, name, compressed, + (flags & VIR_DOMAIN_SAVE_DIRECT) != 0); vm = NULL; cleanup: @@ -2550,18 +2560,24 @@ static int doCoreDump(struct qemud_driver *driver, virDomainObjPtr vm, const char *path, - enum qemud_save_formats compress) + enum qemud_save_formats compress, + bool direct) { int fd = -1; int ret = -1; + virDirectFdPtr directFd = NULL; /* Create an empty file with appropriate ownership. */ - if ((fd = open(path, O_CREAT|O_TRUNC|O_WRONLY, S_IRUSR|S_IWUSR)) < 0) { + if ((fd = open(path, O_CREAT|O_TRUNC|O_WRONLY | (direct ? O_DIRECT : 0), + S_IRUSR|S_IWUSR)) < 0) { qemuReportError(VIR_ERR_OPERATION_FAILED, _("failed to create '%s'"), path); goto cleanup; } + if (direct && (directFd = virDirectFdNew(&fd, path)) == NULL) + goto cleanup; + if (qemuMigrationToFile(driver, vm, fd, 0, path, qemuCompressProgramName(compress), true, false) < 0) goto cleanup; @@ -2572,11 +2588,14 @@ doCoreDump(struct qemud_driver *driver, path); goto cleanup; } + if (virDirectFdClose(directFd) < 0) + goto cleanup; ret = 0; cleanup: VIR_FORCE_CLOSE(fd); + virDirectFdClose(directFd); if (ret != 0) unlink(path); return ret; @@ -2620,7 +2639,7 @@ static int qemudDomainCoreDump(virDomainPtr dom, int ret = -1; virDomainEventPtr event = NULL; - virCheckFlags(VIR_DUMP_LIVE | VIR_DUMP_CRASH, -1); + virCheckFlags(VIR_DUMP_LIVE | VIR_DUMP_CRASH | VIR_DUMP_DIRECT, -1); qemuDriverLock(driver); vm = virDomainFindByUUID(&driver->domains, dom->uuid); @@ -2661,7 +2680,8 @@ static int qemudDomainCoreDump(virDomainPtr dom, } } - ret = doCoreDump(driver, vm, path, getCompressionType(driver)); + ret = doCoreDump(driver, vm, path, getCompressionType(driver), + (flags & VIR_DUMP_DIRECT) != 0); if (ret < 0) goto endjob; @@ -2832,10 +2852,9 @@ static void processWatchdogEvent(void *data, void *opaque) goto endjob; } - ret = doCoreDump(driver, - wdEvent->vm, - dumpfile, - getCompressionType(driver)); + /* XXX wire up qemu.conf to support whether to do direct dumps */ + ret = doCoreDump(driver, wdEvent->vm, dumpfile, + getCompressionType(driver), false); if (ret < 0) qemuReportError(VIR_ERR_OPERATION_FAILED, "%s", _("Dump failed")); -- 1.7.4.4

On 07/14/2011 08:24 AM, Eric Blake wrote:
Saving a domain's state creates a large file, which risks polluting the filesystem cache and slowing down a system. If a system has a lot of domians simultaneously being saved (such as the libvirt-guests init script doing managed saves), then this can cause noticeable slowdown due to filesystem thrashing.
This patch series has been successfully tested to do 'virsh save dom file --direct', with lsof(1) used to verify that O_DIRECT was in use, and the resulting file was successfully used with 'virsh restore file'.
P.S. O_DIRECT is a pain. I wish that posix_fallocate(,POSIX_FADV_SEQUENTIAL) and posix_fallocate(,POSIX_FADV_NOREUSE) were stateful - then libvirt could just declare that the fd is intended to be used in order without leaving behind any caching, then pass the fd on to qemu or the compression program, and the kernel would remember the previous hints and automatically handle the issue without having to go through the libvirt_iohelper program. But the Linux kernel isn't there yet. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

When auto-dumping a domain on crash events, let the user configure whether to have the VIR_DUMP_DIRECT flag effects. * src/qemu/qemu.conf (auto_dump_direct): Document new variable. * src/qemu/libvirtd_qemu.aug (vnc_entry): Let augeas parse it. * src/qemu/qemu_conf.h (qemud_driver): Store new preference. * src/qemu/qemu_conf.c (qemudLoadDriverConfig): Parse it. * src/qemu/qemu_driver.c (processWatchdogEvent): Honor it. --- src/qemu/libvirtd_qemu.aug | 1 + src/qemu/qemu.conf | 8 ++++++++ src/qemu/qemu_conf.c | 6 +++++- src/qemu/qemu_conf.h | 3 ++- src/qemu/qemu_driver.c | 4 ++-- 5 files changed, 18 insertions(+), 4 deletions(-) diff --git a/src/qemu/libvirtd_qemu.aug b/src/qemu/libvirtd_qemu.aug index 66858ae..dea6770 100644 --- a/src/qemu/libvirtd_qemu.aug +++ b/src/qemu/libvirtd_qemu.aug @@ -41,6 +41,7 @@ module Libvirtd_qemu = | str_entry "save_image_format" | str_entry "dump_image_format" | str_entry "auto_dump_path" + | bool_entry "auto_dump_direct" | str_entry "hugetlbfs_mount" | bool_entry "relaxed_acs_check" | bool_entry "vnc_allow_host_audio" diff --git a/src/qemu/qemu.conf b/src/qemu/qemu.conf index 934f99b..2a0664d 100644 --- a/src/qemu/qemu.conf +++ b/src/qemu/qemu.conf @@ -208,6 +208,14 @@ # # auto_dump_path = "/var/lib/libvirt/qemu/dump" +# When a domain is configured to be auto-dumped, enabling this flag +# has the same effect as using the VIR_DUMP_DIRECT flag with the +# virDomainCoreDump API. That is, the system uses O_DIRECT if possible, +# which puts less pressure on the file system cache but may cause +# slower operation. +# +# auto_dump_direct = 0 + # If provided by the host and a hugetlbfs mount point is configured, # a guest may request huge page backing. When this mount point is # unspecified here, determination of a host mount point in /proc/mounts diff --git a/src/qemu/qemu_conf.c b/src/qemu/qemu_conf.c index 3d8aba4..cf6cb6b 100644 --- a/src/qemu/qemu_conf.c +++ b/src/qemu/qemu_conf.c @@ -1,7 +1,7 @@ /* * qemu_conf.c: QEMU configuration management * - * Copyright (C) 2006, 2007, 2008, 2009, 2010 Red Hat, Inc. + * Copyright (C) 2006-2011 Red Hat, Inc. * Copyright (C) 2006 Daniel P. Berrange * * This library is free software; you can redistribute it and/or @@ -378,6 +378,10 @@ int qemudLoadDriverConfig(struct qemud_driver *driver, } } + p = virConfGetValue (conf, "auto_dump_direct"); + CHECK_TYPE ("auto_dump_direct", VIR_CONF_LONG); + if (p) driver->autoDumpDirect = true; + p = virConfGetValue (conf, "hugetlbfs_mount"); CHECK_TYPE ("hugetlbfs_mount", VIR_CONF_STRING); if (p && p->str) { diff --git a/src/qemu/qemu_conf.h b/src/qemu/qemu_conf.h index fa4c607..bc025af 100644 --- a/src/qemu/qemu_conf.h +++ b/src/qemu/qemu_conf.h @@ -1,7 +1,7 @@ /* * qemu_conf.h: QEMU configuration management * - * Copyright (C) 2006-2007, 2009-2010 Red Hat, Inc. + * Copyright (C) 2006-2007, 2009-2011 Red Hat, Inc. * Copyright (C) 2006 Daniel P. Berrange * * This library is free software; you can redistribute it and/or @@ -119,6 +119,7 @@ struct qemud_driver { char *dumpImageFormat; char *autoDumpPath; + bool autoDumpDirect; pciDeviceList *activePciHostdevs; diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index e414e5f..d7cd069 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -2863,9 +2863,9 @@ static void processWatchdogEvent(void *data, void *opaque) goto endjob; } - /* XXX wire up qemu.conf to support whether to do direct dumps */ ret = doCoreDump(driver, wdEvent->vm, dumpfile, - getCompressionType(driver), false); + getCompressionType(driver), + driver->autoDumpDirect); if (ret < 0) qemuReportError(VIR_ERR_OPERATION_FAILED, "%s", _("Dump failed")); -- 1.7.4.4

libvirt-guests is a perfect use case for O_DIRECT - lots of filesystem traffic done at system shutdown, where caching is pointless, and startup, where caching gets in the way. Make this a configurable option in the init script, but defaulting to existing behavior. * tools/libvirt-guests.sysconf (SAVE_DIRECT): New variable. * tools/libvirt-guests.init.sh (suspend_guest): Use it. --- tools/libvirt-guests.init.sh | 5 ++++- tools/libvirt-guests.sysconf | 4 ++++ 2 files changed, 8 insertions(+), 1 deletions(-) diff --git a/tools/libvirt-guests.init.sh b/tools/libvirt-guests.init.sh index 30f957a..4e383c4 100644 --- a/tools/libvirt-guests.init.sh +++ b/tools/libvirt-guests.init.sh @@ -43,6 +43,7 @@ ON_BOOT=start ON_SHUTDOWN=suspend SHUTDOWN_TIMEOUT=0 START_DELAY=0 +SAVE_DIRECT=0 test -f "$sysconfdir"/sysconfig/libvirt-guests && . "$sysconfdir"/sysconfig/libvirt-guests @@ -190,8 +191,10 @@ suspend_guest() name=$(guest_name "$uri" "$guest") label=$(eval_gettext "Suspending \$name: ") + direct= + test "x$SAVE_DIRECT" = x0 || direct=--direct printf %s "$label" - run_virsh "$uri" managedsave "$guest" >/dev/null & + run_virsh "$uri" managedsave "$direct" "$guest" >/dev/null & virsh_pid=$! while true; do sleep 1 diff --git a/tools/libvirt-guests.sysconf b/tools/libvirt-guests.sysconf index 37b258e..da465ad 100644 --- a/tools/libvirt-guests.sysconf +++ b/tools/libvirt-guests.sysconf @@ -25,3 +25,7 @@ # number of seconds we're willing to wait for a guest to shut down #SHUTDOWN_TIMEOUT=0 + +# If non-zero, try to use O_DIRECT when saving and restoring guests, +# for less filesystem cache pollution, but possibly slower operation. +#SAVE_DIRECT=0 -- 1.7.4.4

* include/libvirt/libvirt.h.in (virDomainCreateFlags): Add a flag. (virDomainRestoreFlags): New prototype. * src/libvirt.c (virDomainRestoreFlags): New function. * src/libvirt_public.syms: Export it. * src/driver.h (virDrvDomainRestoreFlags): New driver callback. --- I'm debating whether to squash this into 1/8, or leave it separate. include/libvirt/libvirt.h.in | 8 ++++- src/driver.h | 6 +++ src/libvirt.c | 81 ++++++++++++++++++++++++++++++++++++++++++ src/libvirt_public.syms | 1 + 4 files changed, 95 insertions(+), 1 deletions(-) diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index d9a8694..43881b3 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -235,6 +235,7 @@ typedef enum { VIR_DOMAIN_NONE = 0, /* Default behavior */ VIR_DOMAIN_START_PAUSED = 1 << 0, /* Launch guest in paused state */ VIR_DOMAIN_START_AUTODESTROY = 1 << 1, /* Automatically kill guest when virConnectPtr is closed */ + VIR_DOMAIN_START_DIRECT = 1 << 2, /* Use O_DIRECT while restoring */ } virDomainCreateFlags; @@ -945,7 +946,8 @@ int virDomainResume (virDomainPtr domain); /** * virDomainSaveFlagValues: - * Flags for use in virDomainSaveFlags and virDomainManagedSave. + * Flags for use in virDomainSaveFlags(), virDomainManagedSave(), and + * virDomainRestoreFlags(). */ typedef enum { VIR_DOMAIN_SAVE_DIRECT = 1 << 0, /* Use O_DIRECT while saving */ @@ -959,6 +961,10 @@ int virDomainSaveFlags (virDomainPtr domain, unsigned int flags); int virDomainRestore (virConnectPtr conn, const char *from); +int virDomainRestoreFlags (virConnectPtr conn, + const char *from, + const char *dxml, + unsigned int flags); /* * Managed domain save diff --git a/src/driver.h b/src/driver.h index 7615daa..731c3c4 100644 --- a/src/driver.h +++ b/src/driver.h @@ -186,6 +186,11 @@ typedef int (*virDrvDomainRestore) (virConnectPtr conn, const char *from); typedef int + (*virDrvDomainRestoreFlags) (virConnectPtr conn, + const char *from, + const char *dxml, + unsigned int flags); +typedef int (*virDrvDomainCoreDump) (virDomainPtr domain, const char *to, unsigned int flags); @@ -717,6 +722,7 @@ struct _virDriver { virDrvDomainSave domainSave; virDrvDomainSaveFlags domainSaveFlags; virDrvDomainRestore domainRestore; + virDrvDomainRestoreFlags domainRestoreFlags; virDrvDomainCoreDump domainCoreDump; virDrvDomainScreenshot domainScreenshot; virDrvDomainSetVcpus domainSetVcpus; diff --git a/src/libvirt.c b/src/libvirt.c index 75d174d..bfe103b 100644 --- a/src/libvirt.c +++ b/src/libvirt.c @@ -2382,6 +2382,8 @@ error: * * This method will restore a domain saved to disk by virDomainSave(). * + * See virDomainRestoreFlags() for more control. + * * Returns 0 in case of success and -1 in case of failure. */ int @@ -2433,6 +2435,79 @@ error: } /** + * virDomainRestoreFlags: + * @conn: pointer to the hypervisor connection + * @from: path to the input file + * @dxml: (optional) XML config for adjusting guest xml used on restore + * @flags: bitwise-OR of virDomainSaveFlagValues + * + * This method will restore a domain saved to disk by virDomainSave(). + * + * If the hypervisor supports it, @dxml can be used to alter + * host-specific portions of the domain XML that will be used when + * restoring an image. For example, it is possible to alter the + * backing filename that is associated with a disk device, in order to + * prepare for file renaming done as part of backing up the disk + * device while the domain is stopped. + * + * If @flags includes VIR_DOMAIN_SAVE_DIRECT, then libvirt will attempt + * to use O_DIRECT I/O while restoring the file; this can allow less + * pressure on file system cache, but also risks slowing saves to NFS. + * + * Returns 0 in case of success and -1 in case of failure. + */ +int +virDomainRestoreFlags(virConnectPtr conn, const char *from, const char *dxml, + unsigned int flags) +{ + VIR_DEBUG("conn=%p, from=%s, dxml=%s, flags=%x", + conn, from, NULLSTR(dxml), flags); + + virResetLastError(); + + if (!VIR_IS_CONNECT(conn)) { + virLibConnError(VIR_ERR_INVALID_CONN, __FUNCTION__); + virDispatchError(NULL); + return -1; + } + if (conn->flags & VIR_CONNECT_RO) { + virLibConnError(VIR_ERR_OPERATION_DENIED, __FUNCTION__); + goto error; + } + if (from == NULL) { + virLibConnError(VIR_ERR_INVALID_ARG, __FUNCTION__); + goto error; + } + + if (conn->driver->domainRestoreFlags) { + int ret; + char *absolute_from; + + /* We must absolutize the file path as the restore is done out of process */ + if (virFileAbsPath(from, &absolute_from) < 0) { + virLibConnError(VIR_ERR_INTERNAL_ERROR, + _("could not build absolute input file path")); + goto error; + } + + ret = conn->driver->domainRestoreFlags(conn, absolute_from, dxml, + flags); + + VIR_FREE(absolute_from); + + if (ret < 0) + goto error; + return ret; + } + + virLibConnError(VIR_ERR_NO_SUPPORT, __FUNCTION__); + +error: + virDispatchError(conn); + return -1; +} + +/** * virDomainCoreDump: * @domain: a domain object * @to: path for the core file @@ -6651,6 +6726,12 @@ error: * libvirtd daemon. Any domains marked for auto destroy will * block attempts at migration or save-to-file * + * If the VIR_DOMAIN_START_DIRECT flag is set, and there is a + * managed save file for this domain (created by virDomainManagedSave), + * then libvirt will attempt to use O_DIRECT I/O while loading the file; + * this can allow less pressure on file system cache, but also risks + * slowing loads from NFS. + * * Returns 0 in case of success, -1 in case of error */ int diff --git a/src/libvirt_public.syms b/src/libvirt_public.syms index 0d31b94..ffbc234 100644 --- a/src/libvirt_public.syms +++ b/src/libvirt_public.syms @@ -468,6 +468,7 @@ LIBVIRT_0.9.3 { LIBVIRT_0.9.4 { global: + virDomainRestoreFlags; virDomainSaveFlags; } LIBVIRT_0.9.3; -- 1.7.4.4

On Thu, Jul 14, 2011 at 07:02:23PM -0600, Eric Blake wrote:
* include/libvirt/libvirt.h.in (virDomainCreateFlags): Add a flag. (virDomainRestoreFlags): New prototype. * src/libvirt.c (virDomainRestoreFlags): New function. * src/libvirt_public.syms: Export it. * src/driver.h (virDrvDomainRestoreFlags): New driver callback. ---
I'm debating whether to squash this into 1/8, or leave it separate.
include/libvirt/libvirt.h.in | 8 ++++- src/driver.h | 6 +++ src/libvirt.c | 81 ++++++++++++++++++++++++++++++++++++++++++ src/libvirt_public.syms | 1 + 4 files changed, 95 insertions(+), 1 deletions(-)
diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index d9a8694..43881b3 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -235,6 +235,7 @@ typedef enum { VIR_DOMAIN_NONE = 0, /* Default behavior */ VIR_DOMAIN_START_PAUSED = 1 << 0, /* Launch guest in paused state */ VIR_DOMAIN_START_AUTODESTROY = 1 << 1, /* Automatically kill guest when virConnectPtr is closed */ + VIR_DOMAIN_START_DIRECT = 1 << 2, /* Use O_DIRECT while restoring */ } virDomainCreateFlags;
Same note about flag naming as first patch
@@ -945,7 +946,8 @@ int virDomainResume (virDomainPtr domain);
/** * virDomainSaveFlagValues: - * Flags for use in virDomainSaveFlags and virDomainManagedSave. + * Flags for use in virDomainSaveFlags(), virDomainManagedSave(), and + * virDomainRestoreFlags(). */ typedef enum { VIR_DOMAIN_SAVE_DIRECT = 1 << 0, /* Use O_DIRECT while saving */ @@ -959,6 +961,10 @@ int virDomainSaveFlags (virDomainPtr domain, unsigned int flags); int virDomainRestore (virConnectPtr conn, const char *from); +int virDomainRestoreFlags (virConnectPtr conn, + const char *from, + const char *dxml, + unsigned int flags);
ACK to new API Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

* src/remote/remote_driver.c (remote_driver): Add new callback. * src/remote/remote_protocol.x (remote_procdure): New RPC. (remote_domain_restore_flags_args): New struct. * src/remote_protocol-structs: Update. --- Again, I'm debating whether this should be part of 2/8. src/remote/remote_driver.c | 1 + src/remote/remote_protocol.x | 9 ++++++++- src/remote_protocol-structs | 6 ++++++ 3 files changed, 15 insertions(+), 1 deletions(-) diff --git a/src/remote/remote_driver.c b/src/remote/remote_driver.c index 8907bcc..49d632b 100644 --- a/src/remote/remote_driver.c +++ b/src/remote/remote_driver.c @@ -4164,6 +4164,7 @@ static virDriver remote_driver = { .domainSave = remoteDomainSave, /* 0.3.0 */ .domainSaveFlags = remoteDomainSaveFlags, /* 0.9.4 */ .domainRestore = remoteDomainRestore, /* 0.3.0 */ + .domainRestoreFlags = remoteDomainRestoreFlags, /* 0.9.4 */ .domainCoreDump = remoteDomainCoreDump, /* 0.3.0 */ .domainScreenshot = remoteDomainScreenshot, /* 0.9.2 */ .domainSetVcpus = remoteDomainSetVcpus, /* 0.3.0 */ diff --git a/src/remote/remote_protocol.x b/src/remote/remote_protocol.x index b0b4f6f..ac670df 100644 --- a/src/remote/remote_protocol.x +++ b/src/remote/remote_protocol.x @@ -736,6 +736,12 @@ struct remote_domain_restore_args { remote_nonnull_string from; }; +struct remote_domain_restore_flags_args { + remote_nonnull_string from; + remote_string dxml; + unsigned int flags; +}; + struct remote_domain_core_dump_args { remote_nonnull_domain dom; remote_nonnull_string to; @@ -2392,7 +2398,8 @@ enum remote_procedure { REMOTE_PROC_DOMAIN_GET_CONTROL_INFO = 229, /* autogen autogen */ REMOTE_PROC_DOMAIN_GET_VCPU_PIN_INFO = 230, /* skipgen skipgen */ - REMOTE_PROC_DOMAIN_SAVE_FLAGS = 231 /* autogen autogen */ + REMOTE_PROC_DOMAIN_SAVE_FLAGS = 231, /* autogen autogen */ + REMOTE_PROC_DOMAIN_RESTORE_FLAGS = 232 /* autogen autogen */ /* * Notice how the entries are grouped in sets of 10 ? diff --git a/src/remote_protocol-structs b/src/remote_protocol-structs index 8e90287..4e125fd 100644 --- a/src/remote_protocol-structs +++ b/src/remote_protocol-structs @@ -438,6 +438,11 @@ struct remote_domain_save_flags_args { struct remote_domain_restore_args { remote_nonnull_string from; }; +struct remote_domain_restore_flags_args { + remote_nonnull_string from; + remote_string dxml; + u_int flags; +}; struct remote_domain_core_dump_args { remote_nonnull_domain dom; remote_nonnull_string to; @@ -1866,4 +1871,5 @@ enum remote_procedure { REMOTE_PROC_DOMAIN_GET_CONTROL_INFO = 229, REMOTE_PROC_DOMAIN_GET_VCPU_PIN_INFO = 230, REMOTE_PROC_DOMAIN_SAVE_FLAGS = 231, + REMOTE_PROC_DOMAIN_RESTORE_FLAGS = 232, }; -- 1.7.4.4

On Thu, Jul 14, 2011 at 07:03:46PM -0600, Eric Blake wrote:
* src/remote/remote_driver.c (remote_driver): Add new callback. * src/remote/remote_protocol.x (remote_procdure): New RPC. (remote_domain_restore_flags_args): New struct. * src/remote_protocol-structs: Update. ---
Again, I'm debating whether this should be part of 2/8.
src/remote/remote_driver.c | 1 + src/remote/remote_protocol.x | 9 ++++++++- src/remote_protocol-structs | 6 ++++++ 3 files changed, 15 insertions(+), 1 deletions(-)
ACK, trivial Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

For all hypervisors that support restore, the new API now performs the same functions as the old. * src/libxl/libxl_driver.c (libxlDomainRestore): Move guts... (libxlDomainRestoreFlags): ...to new function. * src/qemu/qemu_driver.c (qemuDomainRestore) (qemuDomainRestoreFlags): Likewise. * src/test/test_driver.c (testDomainRestore) (testDomainRestoreFlags): Likewise. * src/xen/xen_driver.c (xenUnifiedDomainRestore) (xenUnifiedDomainRestoreFlags): Likewise. --- Again, this might be worth merging into 3/8. src/libxl/libxl_driver.c | 17 ++++++++++++++++- src/qemu/qemu_driver.c | 21 +++++++++++++++++++-- src/test/test_driver.c | 22 ++++++++++++++++++++-- src/xen/xen_driver.c | 17 ++++++++++++++++- 4 files changed, 71 insertions(+), 6 deletions(-) diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c index d26cba4..a11a3bb 100644 --- a/src/libxl/libxl_driver.c +++ b/src/libxl/libxl_driver.c @@ -1952,7 +1952,8 @@ libxlDomainSave(virDomainPtr dom, const char *to) } static int -libxlDomainRestore(virConnectPtr conn, const char *from) +libxlDomainRestoreFlags(virConnectPtr conn, const char *from, + const char *dxml, unsigned int flags) { libxlDriverPrivatePtr driver = conn->privateData; virDomainObjPtr vm = NULL; @@ -1961,6 +1962,13 @@ libxlDomainRestore(virConnectPtr conn, const char *from) int fd = -1; int ret = -1; + virCheckFlags(0, -1); + if (dxml) { + libxlError(VIR_ERR_INVALID_ARG, "%s", + _("xml modification unsupported")); + return -1; + } + libxlDriverLock(driver); fd = libxlSaveImageOpen(driver, from, &def, &hdr); @@ -1992,6 +2000,12 @@ cleanup: } static int +libxlDomainRestore(virConnectPtr conn, const char *from) +{ + return libxlDomainRestoreFlags(conn, from, NULL, 0); +} + +static int libxlDomainCoreDump(virDomainPtr dom, const char *to, unsigned int flags) { libxlDriverPrivatePtr driver = dom->conn->privateData; @@ -3849,6 +3863,7 @@ static virDriver libxlDriver = { .domainSave = libxlDomainSave, /* 0.9.2 */ .domainSaveFlags = libxlDomainSaveFlags, /* 0.9.4 */ .domainRestore = libxlDomainRestore, /* 0.9.2 */ + .domainRestoreFlags = libxlDomainRestoreFlags, /* 0.9.4 */ .domainCoreDump = libxlDomainCoreDump, /* 0.9.2 */ .domainSetVcpus = libxlDomainSetVcpus, /* 0.9.0 */ .domainSetVcpusFlags = libxlDomainSetVcpusFlags, /* 0.9.0 */ diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 10e1fd8..122365f 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -3800,8 +3800,10 @@ out: } static int -qemuDomainRestore(virConnectPtr conn, - const char *path) +qemuDomainRestoreFlags(virConnectPtr conn, + const char *path, + const char *dxml, + unsigned int flags) { struct qemud_driver *driver = conn->privateData; virDomainDefPtr def = NULL; @@ -3810,6 +3812,13 @@ qemuDomainRestore(virConnectPtr conn, int ret = -1; struct qemud_save_header header; + virCheckFlags(0, -1); + if (dxml) { + qemuReportError(VIR_ERR_INVALID_ARG, "%s", + _("xml modification unsupported")); + return -1; + } + qemuDriverLock(driver); fd = qemuDomainSaveImageOpen(driver, path, &def, &header); @@ -3849,6 +3858,13 @@ cleanup: } static int +qemuDomainRestore(virConnectPtr conn, + const char *path) +{ + return qemuDomainRestoreFlags(conn, path, NULL, 0); +} + +static int qemuDomainObjRestore(virConnectPtr conn, struct qemud_driver *driver, virDomainObjPtr vm, @@ -8570,6 +8586,7 @@ static virDriver qemuDriver = { .domainSave = qemuDomainSave, /* 0.2.0 */ .domainSaveFlags = qemuDomainSaveFlags, /* 0.9.4 */ .domainRestore = qemuDomainRestore, /* 0.2.0 */ + .domainRestoreFlags = qemuDomainRestoreFlags, /* 0.9.4 */ .domainCoreDump = qemudDomainCoreDump, /* 0.7.0 */ .domainScreenshot = qemuDomainScreenshot, /* 0.9.2 */ .domainSetVcpus = qemuDomainSetVcpus, /* 0.4.4 */ diff --git a/src/test/test_driver.c b/src/test/test_driver.c index 74a1d26..e1621f9 100644 --- a/src/test/test_driver.c +++ b/src/test/test_driver.c @@ -1835,8 +1835,11 @@ testDomainSave(virDomainPtr domain, return testDomainSaveFlags(domain, path, NULL, 0); } -static int testDomainRestore(virConnectPtr conn, - const char *path) +static int +testDomainRestoreFlags(virConnectPtr conn, + const char *path, + const char *dxml, + unsigned int flags) { testConnPtr privconn = conn->privateData; char *xml = NULL; @@ -1848,6 +1851,13 @@ static int testDomainRestore(virConnectPtr conn, virDomainEventPtr event = NULL; int ret = -1; + virCheckFlags(0, -1); + if (dxml) { + testError(VIR_ERR_INVALID_ARG, "%s", + _("xml modification unsupported")); + return -1; + } + if ((fd = open(path, O_RDONLY)) < 0) { virReportSystemError(errno, _("cannot read domain image '%s'"), @@ -1924,6 +1934,13 @@ cleanup: return ret; } +static int +testDomainRestore(virConnectPtr conn, + const char *path) +{ + return testDomainRestoreFlags(conn, path, NULL, 0); +} + static int testDomainCoreDump(virDomainPtr domain, const char *to, unsigned int flags) @@ -5573,6 +5590,7 @@ static virDriver testDriver = { .domainSave = testDomainSave, /* 0.3.2 */ .domainSaveFlags = testDomainSaveFlags, /* 0.9.4 */ .domainRestore = testDomainRestore, /* 0.3.2 */ + .domainRestoreFlags = testDomainRestoreFlags, /* 0.9.4 */ .domainCoreDump = testDomainCoreDump, /* 0.3.2 */ .domainSetVcpus = testSetVcpus, /* 0.1.4 */ .domainSetVcpusFlags = testDomainSetVcpusFlags, /* 0.8.5 */ diff --git a/src/xen/xen_driver.c b/src/xen/xen_driver.c index e7f1eb6..542777e 100644 --- a/src/xen/xen_driver.c +++ b/src/xen/xen_driver.c @@ -1080,11 +1080,19 @@ xenUnifiedDomainSave(virDomainPtr dom, const char *to) } static int -xenUnifiedDomainRestore (virConnectPtr conn, const char *from) +xenUnifiedDomainRestoreFlags(virConnectPtr conn, const char *from, + const char *dxml, unsigned int flags) { GET_PRIVATE(conn); int i; + virCheckFlags(0, -1); + if (dxml) { + xenUnifiedError(VIR_ERR_INVALID_ARG, "%s", + _("xml modification unsupported")); + return -1; + } + for (i = 0; i < XEN_UNIFIED_NR_DRIVERS; ++i) if (priv->opened[i] && drivers[i]->domainRestore && @@ -1095,6 +1103,12 @@ xenUnifiedDomainRestore (virConnectPtr conn, const char *from) } static int +xenUnifiedDomainRestore (virConnectPtr conn, const char *from) +{ + return xenUnifiedDomainRestoreFlags(conn, from, NULL, 0); +} + +static int xenUnifiedDomainCoreDump (virDomainPtr dom, const char *to, unsigned int flags) { GET_PRIVATE(dom->conn); @@ -2215,6 +2229,7 @@ static virDriver xenUnifiedDriver = { .domainSave = xenUnifiedDomainSave, /* 0.0.3 */ .domainSaveFlags = xenUnifiedDomainSaveFlags, /* 0.9.4 */ .domainRestore = xenUnifiedDomainRestore, /* 0.0.3 */ + .domainRestoreFlags = xenUnifiedDomainRestoreFlags, /* 0.9.4 */ .domainCoreDump = xenUnifiedDomainCoreDump, /* 0.1.9 */ .domainSetVcpus = xenUnifiedDomainSetVcpus, /* 0.1.4 */ .domainSetVcpusFlags = xenUnifiedDomainSetVcpusFlags, /* 0.8.5 */ -- 1.7.4.4

On Fri, Jul 15, 2011 at 08:05:02PM -0600, Eric Blake wrote:
For all hypervisors that support restore, the new API now performs the same functions as the old.
* src/libxl/libxl_driver.c (libxlDomainRestore): Move guts... (libxlDomainRestoreFlags): ...to new function. * src/qemu/qemu_driver.c (qemuDomainRestore) (qemuDomainRestoreFlags): Likewise. * src/test/test_driver.c (testDomainRestore) (testDomainRestoreFlags): Likewise. * src/xen/xen_driver.c (xenUnifiedDomainRestore) (xenUnifiedDomainRestoreFlags): Likewise. ---
Again, this might be worth merging into 3/8.
src/libxl/libxl_driver.c | 17 ++++++++++++++++- src/qemu/qemu_driver.c | 21 +++++++++++++++++++-- src/test/test_driver.c | 22 ++++++++++++++++++++-- src/xen/xen_driver.c | 17 ++++++++++++++++- 4 files changed, 71 insertions(+), 6 deletions(-)
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

* tools/virsh.c (cmdStart, cmdRestore): Add new flag. * tools/virsh.pod (start, restore): Document flags. --- Counterpart to 4/8 tools/virsh.c | 9 ++++++++- tools/virsh.pod | 10 ++++++++-- 2 files changed, 16 insertions(+), 3 deletions(-) diff --git a/tools/virsh.c b/tools/virsh.c index 3cf0618..00ea533 100644 --- a/tools/virsh.c +++ b/tools/virsh.c @@ -1481,6 +1481,7 @@ static const vshCmdOptDef opts_start[] = { #endif {"paused", VSH_OT_BOOL, 0, N_("leave the guest paused after creation")}, {"autodestroy", VSH_OT_BOOL, 0, N_("automatically destroy the guest when virsh disconnects")}, + {"direct", VSH_OT_BOOL, 0, N_("use O_DIRECT when loading")}, {NULL, 0, 0, NULL} }; @@ -1511,6 +1512,8 @@ cmdStart(vshControl *ctl, const vshCmd *cmd) flags |= VIR_DOMAIN_START_PAUSED; if (vshCommandOptBool(cmd, "autodestroy")) flags |= VIR_DOMAIN_START_AUTODESTROY; + if (vshCommandOptBool(cmd, "direct")) + flags |= VIR_DOMAIN_START_DIRECT; /* Prefer older API unless we have to pass a flag. */ if ((flags ? virDomainCreateWithFlags(dom, flags) @@ -1940,6 +1943,7 @@ static const vshCmdInfo info_restore[] = { static const vshCmdOptDef opts_restore[] = { {"file", VSH_OT_DATA, VSH_OFLAG_REQ, N_("the state to restore")}, + {"direct", VSH_OT_BOOL, 0, N_("use O_DIRECT when restoring")}, {NULL, 0, 0, NULL} }; @@ -1948,6 +1952,7 @@ cmdRestore(vshControl *ctl, const vshCmd *cmd) { const char *from = NULL; bool ret = true; + bool direct = vshCommandOptBool(cmd, "direct"); if (!vshConnectionUsability(ctl, ctl->conn)) return false; @@ -1955,7 +1960,9 @@ cmdRestore(vshControl *ctl, const vshCmd *cmd) if (vshCommandOptString(cmd, "file", &from) <= 0) return false; - if (virDomainRestore(ctl->conn, from) == 0) { + if ((direct ? virDomainRestoreFlags(ctl->conn, from, NULL, + VIR_DOMAIN_SAVE_DIRECT) + : virDomainRestore(ctl->conn, from)) == 0) { vshPrint(ctl, _("Domain restored from %s\n"), from); } else { vshError(ctl, _("Failed to restore domain from %s"), from); diff --git a/tools/virsh.pod b/tools/virsh.pod index 1e04464..8e8e9bc 100644 --- a/tools/virsh.pod +++ b/tools/virsh.pod @@ -592,10 +592,13 @@ domain actually reboots. The exact behavior of a domain when it reboots is set by the I<on_reboot> parameter in the domain's XML definition. -=item B<restore> I<state-file> +=item B<restore> I<state-file> [I<--direct>] Restores a domain from a B<virsh save> state file. See I<save> for more info. +If I<--direct> is specified, the restore uses O_DIRECT, which reduces +file system cache pressure but may slow down the operation. + B<Note>: To avoid corrupting file system contents within the domain, you should not reuse the saved state file for a second B<restore> unless you have also reverted all storage volumes back to the same contents as when @@ -794,6 +797,7 @@ The exact behavior of a domain when it shuts down is set by the I<on_shutdown> parameter in the domain's XML definition. =item B<start> I<domain-name> [I<--console>] [I<--paused>] [I<--autodestroy>] +[I<--direct>] Start a (previously defined) inactive domain, either from the last B<managedsave> state, or via a fresh boot if no managedsave state is @@ -802,7 +806,9 @@ used and supported by the driver; otherwise it will be running. If I<--console> is requested, attach to the console after creation. If I<--autodestroy> is requested, then the guest will be automatically destroyed when virsh closes its connection to libvirt, or otherwise -exits. +exits. If I<--direct> is specified, and managedsave state exists, +the restore uses O_DIRECT, which reduces file system cache pressure +but may slow down the operation. =item B<suspend> I<domain-id> -- 1.7.4.4

On Fri, Jul 15, 2011 at 08:42:44PM -0600, Eric Blake wrote:
* tools/virsh.c (cmdStart, cmdRestore): Add new flag. * tools/virsh.pod (start, restore): Document flags. ---
Counterpart to 4/8
tools/virsh.c | 9 ++++++++- tools/virsh.pod | 10 ++++++++-- 2 files changed, 16 insertions(+), 3 deletions(-)
diff --git a/tools/virsh.c b/tools/virsh.c index 3cf0618..00ea533 100644 --- a/tools/virsh.c +++ b/tools/virsh.c @@ -1481,6 +1481,7 @@ static const vshCmdOptDef opts_start[] = { #endif {"paused", VSH_OT_BOOL, 0, N_("leave the guest paused after creation")}, {"autodestroy", VSH_OT_BOOL, 0, N_("automatically destroy the guest when virsh disconnects")}, + {"direct", VSH_OT_BOOL, 0, N_("use O_DIRECT when loading")}, {NULL, 0, 0, NULL} };
ACK when flag name question is resolved. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Constraining the problem makes the solution easier to think about. * src/util/iohelper.c (runIO): Make read support easy. --- Could be squashed into 6/8. src/util/iohelper.c | 9 ++++++++- 1 files changed, 8 insertions(+), 1 deletions(-) diff --git a/src/util/iohelper.c b/src/util/iohelper.c index 82f62e1..f091d13 100644 --- a/src/util/iohelper.c +++ b/src/util/iohelper.c @@ -107,6 +107,13 @@ runIO(const char *path, int fd, int oflags, unsigned long long length) fdinname = path; fdout = STDOUT_FILENO; fdoutname = "stdout"; + /* To make the implementation simpler, we give up on any + * attempt to use O_DIRECT in a non-trivial manner. */ + if (direct && ((end = lseek(fd, 0, SEEK_CUR)) != 0 || length)) { + virReportSystemError(end < 0 ? errno : EINVAL, "%s", + _("O_DIRECT read needs entire seekable file")); + goto cleanup; + } break; case O_WRONLY: fdin = STDIN_FILENO; @@ -117,7 +124,7 @@ runIO(const char *path, int fd, int oflags, unsigned long long length) * attempt to use O_DIRECT in a non-trivial manner. */ if (direct && (end = lseek(fd, 0, SEEK_END)) != 0) { virReportSystemError(end < 0 ? errno : EINVAL, "%s", - _("O_DIRECT needs empty seekable file")); + _("O_DIRECT write needs empty seekable file")); goto cleanup; } break; -- 1.7.4.4

On Fri, Jul 15, 2011 at 08:48:18PM -0600, Eric Blake wrote:
Constraining the problem makes the solution easier to think about.
* src/util/iohelper.c (runIO): Make read support easy. ---
Could be squashed into 6/8.
src/util/iohelper.c | 9 ++++++++- 1 files changed, 8 insertions(+), 1 deletions(-)
diff --git a/src/util/iohelper.c b/src/util/iohelper.c index 82f62e1..f091d13 100644 --- a/src/util/iohelper.c +++ b/src/util/iohelper.c @@ -107,6 +107,13 @@ runIO(const char *path, int fd, int oflags, unsigned long long length) fdinname = path; fdout = STDOUT_FILENO; fdoutname = "stdout"; + /* To make the implementation simpler, we give up on any + * attempt to use O_DIRECT in a non-trivial manner. */ + if (direct && ((end = lseek(fd, 0, SEEK_CUR)) != 0 || length)) { + virReportSystemError(end < 0 ? errno : EINVAL, "%s", + _("O_DIRECT read needs entire seekable file")); + goto cleanup; + } break; case O_WRONLY: fdin = STDIN_FILENO; @@ -117,7 +124,7 @@ runIO(const char *path, int fd, int oflags, unsigned long long length) * attempt to use O_DIRECT in a non-trivial manner. */ if (direct && (end = lseek(fd, 0, SEEK_END)) != 0) { virReportSystemError(end < 0 ? errno : EINVAL, "%s", - _("O_DIRECT needs empty seekable file")); + _("O_DIRECT write needs empty seekable file")); goto cleanup; } break;
ACK, if this really works for saving guests. We need to skip over the initial header + XML in save files, so I would have thought we needed to support a non-empty seekable file. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 07/19/2011 11:50 AM, Daniel P. Berrange wrote:
On Fri, Jul 15, 2011 at 08:48:18PM -0600, Eric Blake wrote:
Constraining the problem makes the solution easier to think about.
* src/util/iohelper.c (runIO): Make read support easy. ---
Could be squashed into 6/8.
src/util/iohelper.c | 9 ++++++++-
ACK, if this really works for saving guests. We need to skip over the initial header + XML in save files, so I would have thought we needed to support a non-empty seekable file.
Yes, it really works. QEMU state files need not be seekable (it is the same format as used for migration, which can be passed over non-seekable sockets). Therefore, all we are doing is using iohelper to wrap a pipe() around the underlying file, at which point, all I/O is linear. We do the wrapping _prior_ to the point that libvirt writes the header, and also prior to the point that libvirt then hands the fd over to qemu, for qemu to write the rest of the file. Therefore, from iohelper's perspective, we don't care who did the writing, but the end result is that all i/o is aggregated and replayed into the O_DIRECT fd from start to finish. Same thing on read - libvirt reads, rather than seeks, the header. Therefore, libvirt reads the header, then hands the fd over to qemu to read the rest of the file. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

Wire up the restore direction to use iohelper for O_DIRECT. * src/qemu/qemu_driver.c (qemuddDomainObjStart) (qemuDomainSaveImageOpen, qemuDomainObjRestore) (qemuDomainObjStart): Add parameter. (qemudDomainStartWithFlags, qemuAutostartDomain) (qemuDomainRestoreFlags): Update callers. --- Could be squashed into 8/8. src/qemu/qemu_driver.c | 80 ++++++++++++++++++++++++++++++----------------- 1 files changed, 51 insertions(+), 29 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index a42ff8e..27b971f 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -117,11 +117,12 @@ static void processWatchdogEvent(void *data, void *opaque); static int qemudShutdown(void); -static int qemudDomainObjStart(virConnectPtr conn, - struct qemud_driver *driver, - virDomainObjPtr vm, - bool start_paused, - bool autodestroy); +static int qemuDomainObjStart(virConnectPtr conn, + struct qemud_driver *driver, + virDomainObjPtr vm, + bool start_paused, + bool autodestroy, + bool direct); static int qemudDomainGetMaxVcpus(virDomainPtr dom); @@ -149,9 +150,11 @@ qemuAutostartDomain(void *payload, const void *name ATTRIBUTE_UNUSED, void *opaq vm->def->name, err ? err->message : _("unknown error")); } else { + /* XXX need to wire direct autostart into qemu.conf */ if (vm->autostart && !virDomainObjIsActive(vm) && - qemudDomainObjStart(data->conn, data->driver, vm, false, false) < 0) { + qemuDomainObjStart(data->conn, data->driver, vm, + false, false, false) < 0) { err = virGetLastError(); VIR_ERROR(_("Failed to autostart VM '%s': %s"), vm->def->name, @@ -3604,18 +3607,20 @@ cleanup: return ret; } -static int ATTRIBUTE_NONNULL(3) ATTRIBUTE_NONNULL(4) +static int ATTRIBUTE_NONNULL(3) ATTRIBUTE_NONNULL(4) ATTRIBUTE_NONNULL(6) qemuDomainSaveImageOpen(struct qemud_driver *driver, const char *path, virDomainDefPtr *ret_def, - struct qemud_save_header *ret_header) + struct qemud_save_header *ret_header, + bool direct, virDirectFdPtr *directFd) { int fd; struct qemud_save_header header; char *xml = NULL; virDomainDefPtr def = NULL; - if ((fd = virFileOpenAs(path, O_RDONLY, 0, getuid(), getgid(), 0)) < 0) { + if ((fd = virFileOpenAs(path, O_RDONLY | (direct ? O_DIRECT : 0), 0, + getuid(), getgid(), 0)) < 0) { if ((fd != -EACCES && fd != -EPERM) || driver->user == getuid()) { qemuReportError(VIR_ERR_OPERATION_FAILED, @@ -3625,7 +3630,7 @@ qemuDomainSaveImageOpen(struct qemud_driver *driver, /* Opening as root failed, but qemu runs as a different user * that might have better luck. */ - if ((fd = virFileOpenAs(path, O_RDONLY, 0, + if ((fd = virFileOpenAs(path, O_RDONLY | (direct ? O_DIRECT : 0), 0, driver->user, driver->group, VIR_FILE_OPEN_AS_UID)) < 0) { qemuReportError(VIR_ERR_OPERATION_FAILED, @@ -3633,6 +3638,8 @@ qemuDomainSaveImageOpen(struct qemud_driver *driver, goto error; } } + if (direct && (*directFd = virDirectFdNew(&fd, path)) == NULL) + goto error; if (saferead(fd, &header, sizeof(header)) != sizeof(header)) { qemuReportError(VIR_ERR_OPERATION_FAILED, @@ -3811,8 +3818,9 @@ qemuDomainRestoreFlags(virConnectPtr conn, int fd = -1; int ret = -1; struct qemud_save_header header; + virDirectFdPtr directFd = NULL; - virCheckFlags(0, -1); + virCheckFlags(VIR_DOMAIN_SAVE_DIRECT, -1); if (dxml) { qemuReportError(VIR_ERR_INVALID_ARG, "%s", _("xml modification unsupported")); @@ -3821,7 +3829,8 @@ qemuDomainRestoreFlags(virConnectPtr conn, qemuDriverLock(driver); - fd = qemuDomainSaveImageOpen(driver, path, &def, &header); + fd = qemuDomainSaveImageOpen(driver, path, &def, &header, + flags & VIR_DOMAIN_SAVE_DIRECT, &directFd); if (fd < 0) goto cleanup; @@ -3840,6 +3849,8 @@ qemuDomainRestoreFlags(virConnectPtr conn, goto cleanup; ret = qemuDomainSaveImageStartVM(conn, driver, vm, &fd, &header, path); + if (virDirectFdClose(directFd) < 0) + VIR_WARN("Failed to close %s", path); if (qemuDomainObjEndJob(driver, vm) == 0) vm = NULL; @@ -3851,6 +3862,7 @@ qemuDomainRestoreFlags(virConnectPtr conn, cleanup: virDomainDefFree(def); VIR_FORCE_CLOSE(fd); + virDirectFdFree(directFd); if (vm) virDomainObjUnlock(vm); qemuDriverUnlock(driver); @@ -3868,14 +3880,17 @@ static int qemuDomainObjRestore(virConnectPtr conn, struct qemud_driver *driver, virDomainObjPtr vm, - const char *path) + const char *path, + bool direct) { virDomainDefPtr def = NULL; int fd = -1; int ret = -1; struct qemud_save_header header; + virDirectFdPtr directFd = NULL; - fd = qemuDomainSaveImageOpen(driver, path, &def, &header); + fd = qemuDomainSaveImageOpen(driver, path, &def, &header, + direct, &directFd); if (fd < 0) goto cleanup; @@ -3897,10 +3912,13 @@ qemuDomainObjRestore(virConnectPtr conn, def = NULL; ret = qemuDomainSaveImageStartVM(conn, driver, vm, &fd, &header, path); + if (virDirectFdClose(directFd) < 0) + VIR_WARN("Failed to close %s", path); cleanup: virDomainDefFree(def); VIR_FORCE_CLOSE(fd); + virDirectFdFree(directFd); return ret; } @@ -4114,11 +4132,13 @@ static int qemudNumDefinedDomains(virConnectPtr conn) { } -static int qemudDomainObjStart(virConnectPtr conn, - struct qemud_driver *driver, - virDomainObjPtr vm, - bool start_paused, - bool autodestroy) +static int +qemuDomainObjStart(virConnectPtr conn, + struct qemud_driver *driver, + virDomainObjPtr vm, + bool start_paused, + bool autodestroy, + bool direct) { int ret = -1; char *managed_save; @@ -4133,7 +4153,7 @@ static int qemudDomainObjStart(virConnectPtr conn, goto cleanup; if (virFileExists(managed_save)) { - ret = qemuDomainObjRestore(conn, driver, vm, managed_save); + ret = qemuDomainObjRestore(conn, driver, vm, managed_save, direct); if ((ret == 0) && (unlink(managed_save) < 0)) VIR_WARN("Failed to remove the managed state %s", managed_save); @@ -4159,14 +4179,15 @@ cleanup: } static int -qemudDomainStartWithFlags(virDomainPtr dom, unsigned int flags) +qemuDomainStartWithFlags(virDomainPtr dom, unsigned int flags) { struct qemud_driver *driver = dom->conn->privateData; virDomainObjPtr vm; int ret = -1; virCheckFlags(VIR_DOMAIN_START_PAUSED | - VIR_DOMAIN_START_AUTODESTROY, -1); + VIR_DOMAIN_START_AUTODESTROY | + VIR_DOMAIN_START_DIRECT, -1); qemuDriverLock(driver); vm = virDomainFindByUUID(&driver->domains, dom->uuid); @@ -4188,9 +4209,10 @@ qemudDomainStartWithFlags(virDomainPtr dom, unsigned int flags) goto endjob; } - if (qemudDomainObjStart(dom->conn, driver, vm, - (flags & VIR_DOMAIN_START_PAUSED) != 0, - (flags & VIR_DOMAIN_START_AUTODESTROY) != 0) < 0) + if (qemuDomainObjStart(dom->conn, driver, vm, + (flags & VIR_DOMAIN_START_PAUSED) != 0, + (flags & VIR_DOMAIN_START_AUTODESTROY) != 0, + (flags & VIR_DOMAIN_START_DIRECT) != 0) < 0) goto endjob; ret = 0; @@ -4207,9 +4229,9 @@ cleanup: } static int -qemudDomainStart(virDomainPtr dom) +qemuDomainStart(virDomainPtr dom) { - return qemudDomainStartWithFlags(dom, 0); + return qemuDomainStartWithFlags(dom, 0); } static int @@ -8615,8 +8637,8 @@ static virDriver qemuDriver = { .domainXMLToNative = qemuDomainXMLToNative, /* 0.6.4 */ .listDefinedDomains = qemudListDefinedDomains, /* 0.2.0 */ .numOfDefinedDomains = qemudNumDefinedDomains, /* 0.2.0 */ - .domainCreate = qemudDomainStart, /* 0.2.0 */ - .domainCreateWithFlags = qemudDomainStartWithFlags, /* 0.8.2 */ + .domainCreate = qemuDomainStart, /* 0.2.0 */ + .domainCreateWithFlags = qemuDomainStartWithFlags, /* 0.8.2 */ .domainDefineXML = qemudDomainDefine, /* 0.2.0 */ .domainUndefine = qemudDomainUndefine, /* 0.2.0 */ .domainAttachDevice = qemuDomainAttachDevice, /* 0.4.1 */ -- 1.7.4.4

On Sat, Jul 16, 2011 at 05:09:06PM -0600, Eric Blake wrote:
Wire up the restore direction to use iohelper for O_DIRECT.
* src/qemu/qemu_driver.c (qemuddDomainObjStart) (qemuDomainSaveImageOpen, qemuDomainObjRestore) (qemuDomainObjStart): Add parameter. (qemudDomainStartWithFlags, qemuAutostartDomain) (qemuDomainRestoreFlags): Update callers. ---
Could be squashed into 8/8.
src/qemu/qemu_driver.c | 80 ++++++++++++++++++++++++++++++----------------- 1 files changed, 51 insertions(+), 29 deletions(-)
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

When auto-starting a domain on libvirtd startup, let the user configure whether to have the VIR_DOMAIN_START_DIRECT flag effect. * src/qemu/qemu.conf (auto_start_direct): Document new variable. * src/qemu/libvirtd_qemu.aug (vnc_entry): Let augeas parse it. * src/qemu/qemu_conf.h (qemud_driver): Store new preference. * src/qemu/qemu_conf.c (qemudLoadDriverConfig): Parse it. * src/qemu/qemu_driver.c (qemuAutostartDomain): Honor it. --- Might be worth combining with 9/8. src/qemu/libvirtd_qemu.aug | 1 + src/qemu/qemu.conf | 8 ++++++++ src/qemu/qemu_conf.c | 4 ++++ src/qemu/qemu_conf.h | 2 ++ src/qemu/qemu_driver.c | 4 ++-- 5 files changed, 17 insertions(+), 2 deletions(-) diff --git a/src/qemu/libvirtd_qemu.aug b/src/qemu/libvirtd_qemu.aug index dea6770..a78cd10 100644 --- a/src/qemu/libvirtd_qemu.aug +++ b/src/qemu/libvirtd_qemu.aug @@ -42,6 +42,7 @@ module Libvirtd_qemu = | str_entry "dump_image_format" | str_entry "auto_dump_path" | bool_entry "auto_dump_direct" + | bool_entry "auto_start_direct" | str_entry "hugetlbfs_mount" | bool_entry "relaxed_acs_check" | bool_entry "vnc_allow_host_audio" diff --git a/src/qemu/qemu.conf b/src/qemu/qemu.conf index 2a0664d..48ae781 100644 --- a/src/qemu/qemu.conf +++ b/src/qemu/qemu.conf @@ -216,6 +216,14 @@ # # auto_dump_direct = 0 +# When a domain is configured to be auto-started, enabling this flag +# has the same effect as using the VIR_DOMAIN_START_DIRECT flag with the +# virDomainCreateWithFlags API. That is, the system uses O_DIRECT if +# possible, which puts less pressure on the file system cache but may +# cause slower operation. +# +# auto_start_direct = 0 + # If provided by the host and a hugetlbfs mount point is configured, # a guest may request huge page backing. When this mount point is # unspecified here, determination of a host mount point in /proc/mounts diff --git a/src/qemu/qemu_conf.c b/src/qemu/qemu_conf.c index cf6cb6b..144dbda 100644 --- a/src/qemu/qemu_conf.c +++ b/src/qemu/qemu_conf.c @@ -382,6 +382,10 @@ int qemudLoadDriverConfig(struct qemud_driver *driver, CHECK_TYPE ("auto_dump_direct", VIR_CONF_LONG); if (p) driver->autoDumpDirect = true; + p = virConfGetValue (conf, "auto_start_direct"); + CHECK_TYPE ("auto_start_direct", VIR_CONF_LONG); + if (p) driver->autoStartDirect = true; + p = virConfGetValue (conf, "hugetlbfs_mount"); CHECK_TYPE ("hugetlbfs_mount", VIR_CONF_STRING); if (p && p->str) { diff --git a/src/qemu/qemu_conf.h b/src/qemu/qemu_conf.h index bc025af..afc3ef4 100644 --- a/src/qemu/qemu_conf.h +++ b/src/qemu/qemu_conf.h @@ -121,6 +121,8 @@ struct qemud_driver { char *autoDumpPath; bool autoDumpDirect; + bool autoStartDirect; + pciDeviceList *activePciHostdevs; virBitmapPtr reservedVNCPorts; diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 27b971f..64fe3b9 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -150,11 +150,11 @@ qemuAutostartDomain(void *payload, const void *name ATTRIBUTE_UNUSED, void *opaq vm->def->name, err ? err->message : _("unknown error")); } else { - /* XXX need to wire direct autostart into qemu.conf */ if (vm->autostart && !virDomainObjIsActive(vm) && qemuDomainObjStart(data->conn, data->driver, vm, - false, false, false) < 0) { + false, false, + data->driver->autoStartDirect) < 0) { err = virGetLastError(); VIR_ERROR(_("Failed to autostart VM '%s': %s"), vm->def->name, -- 1.7.4.4

On Mon, Jul 18, 2011 at 09:45:34AM -0600, Eric Blake wrote:
When auto-starting a domain on libvirtd startup, let the user configure whether to have the VIR_DOMAIN_START_DIRECT flag effect.
* src/qemu/qemu.conf (auto_start_direct): Document new variable. * src/qemu/libvirtd_qemu.aug (vnc_entry): Let augeas parse it. * src/qemu/qemu_conf.h (qemud_driver): Store new preference. * src/qemu/qemu_conf.c (qemudLoadDriverConfig): Parse it. * src/qemu/qemu_driver.c (qemuAutostartDomain): Honor it. ---
Might be worth combining with 9/8.
src/qemu/libvirtd_qemu.aug | 1 + src/qemu/qemu.conf | 8 ++++++++ src/qemu/qemu_conf.c | 4 ++++ src/qemu/qemu_conf.h | 2 ++ src/qemu/qemu_driver.c | 4 ++-- 5 files changed, 17 insertions(+), 2 deletions(-)
diff --git a/src/qemu/libvirtd_qemu.aug b/src/qemu/libvirtd_qemu.aug index dea6770..a78cd10 100644 --- a/src/qemu/libvirtd_qemu.aug +++ b/src/qemu/libvirtd_qemu.aug @@ -42,6 +42,7 @@ module Libvirtd_qemu = | str_entry "dump_image_format" | str_entry "auto_dump_path" | bool_entry "auto_dump_direct" + | bool_entry "auto_start_direct" | str_entry "hugetlbfs_mount" | bool_entry "relaxed_acs_check" | bool_entry "vnc_allow_host_audio"
Again, flag naming question. ACK with that resolved Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

* tools/libvirt-guests.init.sh (start): Use SAVE_DIRECT. --- Patch 10/8 has a bug fixed here - the code must use $direct rather than "$direct" so as to elide the argument rather than inject an empty string argument when the option is not in use. The more I look at this series, the more I like the idea of adding both save and restore direct support at the same time, so I'll post a v2 that rebases things accordingly. tools/libvirt-guests.init.sh | 6 ++++-- 1 files changed, 4 insertions(+), 2 deletions(-) diff --git a/tools/libvirt-guests.init.sh b/tools/libvirt-guests.init.sh index 4e383c4..7e1dd51 100644 --- a/tools/libvirt-guests.init.sh +++ b/tools/libvirt-guests.init.sh @@ -144,6 +144,8 @@ start() { fi isfirst=true + direct= + test "x$SAVE_DIRECT" = x0 || direct=--direct while read uri list; do configured=false set -f @@ -173,7 +175,7 @@ start() { else sleep $START_DELAY fi - retval run_virsh "$uri" start "$name" >/dev/null && \ + retval run_virsh "$uri" start $direct "$name" >/dev/null && \ gettext "done"; echo fi fi @@ -194,7 +196,7 @@ suspend_guest() direct= test "x$SAVE_DIRECT" = x0 || direct=--direct printf %s "$label" - run_virsh "$uri" managedsave "$direct" "$guest" >/dev/null & + run_virsh "$uri" managedsave $direct "$guest" >/dev/null & virsh_pid=$! while true; do sleep 1 -- 1.7.4.4
participants (2)
-
Daniel P. Berrange
-
Eric Blake