On 11/09/2011 05:05 AM, Srivatsa S. Bhat wrote:
Add the core functions that implement the functionality of the API.
Suspend is done by using an asynchronous mechanism so that we can return
the status to the caller successfully before the host gets suspended. This
asynchronous operation is achieved by suspending the host in a separate
thread of execution.
To resume the host, an RTC alarm is set up (based on how long we want
to suspend) before suspending the host. When this alarm fires, the host
gets woken up.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat(a)linux.vnet.ibm.com>
---
include/libvirt/libvirt.h.in | 5 +
src/libvirt_private.syms | 7 +
src/nodeinfo.c | 220 ++++++++++++++++++++++++++++++++++++++++++
src/nodeinfo.h | 14 +++
src/qemu/qemu_driver.c | 5 +
src/util/threads-pthread.c | 17 +++
src/util/threads.h | 1
It looks weird seeing a public change (include/libvirt/libvirt.h.in)
mixed in with a driver-specific change (src/qemu/qemu_driver.c). I
would favor re-ordering the patches, by splitting patch 2/2 into two
parts (public API, then src/remote/remote_protocol.x implementation of
remote driver), then putting this patch 1/2 as the third patch. We
also need a fourth patch, to tools/virsh.{pod,c} to expose the new
feature through virsh.
7 files changed, 268 insertions(+), 1 deletions(-)
diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in
index aa320b6..25f1c9b 100644
--- a/include/libvirt/libvirt.h.in
+++ b/include/libvirt/libvirt.h.in
@@ -3357,6 +3357,11 @@ typedef struct _virTypedParameter virMemoryParameter;
*/
typedef virMemoryParameter *virMemoryParameterPtr;
+typedef enum {
+ VIR_S3 = 1, /* Suspend-to-RAM */
+ VIR_S4 = 2 /* Suspend-to-disk */
We tend to use trailing commas in enum decls (we require C99, after
all), so that if we later add VIR_... = 3, we don't have to edit the
VIR_S4 = 2, /* Suspend-to-disk */ line.
+} virSuspendState;
+
This hunk should be moved alongside the half of 2/2 that deals with the
public API. Also, float it up to appear earlier in the file;
virMemoryParameter is in the section of deprecated names at the bottom
for a reason, but virSuspendState is not deprecated, so it should appear
sooner. I wonder if we should name this VIR_NODE_S3 and
virNodeSuspendState, since it a different concept of 'suspend' than
virDomainSuspend.
+++ b/src/libvirt_private.syms
@@ -844,6 +844,13 @@ nodeGetCellsFreeMemory;
nodeGetFreeMemory;
nodeGetInfo;
nodeGetMemoryStats;
+virSuspendLock;
+virSuspendUnlock;
+virSuspendInit;
+nodeSuspendForDuration;
+setNodeWakeup;
+nodeSuspend;
+virSuspendCleanup;
Sort these lines.
@@ -897,3 +905,215 @@ unsigned long long
nodeGetFreeMemory(virConnectPtr conn ATTRIBUTE_UNUSED)
return 0;
}
#endif
+
+
+static int initialized;
+virMutex virSuspendMutex;
+
+int virSuspendLock(void)
+{
+ return virMutexTryLock(&virSuspendMutex);
+}
+
+void virSuspendUnlock(void)
+{
+ virMutexUnlock(&virSuspendMutex);
+}
+
+/**
+ * virSuspendInit:
+ *
+ * Get the low power states supported by the host, such as Suspend-to-RAM (S3)
+ * or Suspend-to-Disk (S4), so that a request to suspend/hibernate the host
+ * can be handled appropriately based on this information.
+ *
+ * Returns 0 if successful, and -1 in case of error.
+ */
+int virSuspendInit(void)
+{
+
+ if (virMutexInit(&virSuspendMutex) < 0)
+ return -1;
+
+ /* Get the power management capabilities supported by the host.
+ * Ensure that this is done only once, by using the 'initialized'
+ * variable.
Should this function be marked static? If it should only be called
once, and 'initialized' is static, then no one outside of this file
should be calling it.
+ */
+ if (virGetPMCapabilities(&hostPMFeatures) < 0) {
+ VIR_ERROR(_("Failed to get host power management features"));
+ return -1;
+ }
+
+ return 0;
+}
+
+
+/**
+ * nodeSuspendForDuration:
+ * @conn: pointer to the hypervisor connection
+ * @state: the state to which the host must be suspended to -
+ * VIR_HOST_PM_S3 (Suspend-to-RAM)
+ * or VIR_HOST_PM_S4 (Suspend-to-disk)
+ * @duration: the time duration in seconds, for which the host
+ * must be suspended
+ *
+ * Suspend the node (host machine) for the given duration of time
+ * in the specified state (such as S3 or S4). Resume the node
+ * after the time duration is complete.
+ *
+ * An RTC alarm is set appropriately to wake up the node from
+ * its sleep state. Then the actual node suspend is carried out
+ * asynchronously in another thread, after a short time delay
+ * so as to give enough time for this function to return status
+ * to its caller through the connection.
+ *
+ * Returns 0 in case the node is going to be suspended after a short
+ * delay, -1 if suspending the node is not supported, or if a
+ * previous suspend operation is still in progress.
Do we need a way to tune the delay before the suspend? I'm worried that
a heavily-loaded machine will fail to send the response prior to the
suspend. Maybe the best we can do is state that the response is
best-effort, and should generally happen due to the short delay but
might be missed if the machine suspends first. Or can we wire things
into virnetserver.c and friends to guarantee that we don't attempt the
suspend until after the response has been successfully delivered?
+ */
+int nodeSuspendForDuration(virConnectPtr conn ATTRIBUTE_UNUSED,
+ int state,
+ unsigned long long duration)
+{
+ static virThread thread;
+ char *cmdString;
+
+ if (!initialized) {
+ if(virSuspendInit() < 0)
+ return -1;
+ initialized = 1;
+ }
+
+ /*
+ * Ensure that we are the only ones trying to suspend.
+ * Fail if somebody has already initiated suspend.
+ */
+ if (!virSuspendLock())
+ return -1;
+
+ /* Check if the host supports the requested suspend state */
+ switch (state) {
+ case VIR_S3:
This says VIR_S3, but the docs above said VIR_HOST_PM_S3. We need to
have consistent naming between the code and docs.
+ if (hostPMFeatures & VIR_S3) {
+ cmdString = strdup("pm-suspend");
+ if (cmdString == NULL) {
+ virReportOOMError();
+ goto cleanup;
+ }
+ break;
+ }
+ goto cleanup;
+ case VIR_S4:
+ if (hostPMFeatures & VIR_S4) {
+ cmdString = strdup("pm-hibernate");
+ if (cmdString == NULL) {
+ virReportOOMError();
+ goto cleanup;
+ }
+ break;
+ }
+ goto cleanup;
+ default:
+ goto cleanup;
+ }
+
+ /* Just set the RTC alarm. Don't suspend yet. */
+ if (setNodeWakeup(duration) < 0) {
+ nodeReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+ _("Failed to set up the RTC alarm for node
wakeup\n"));
+ goto cleanup;
+ }
+
+ if (virThreadCreate(&thread, false, nodeSuspend, (void *)cmdString) < 0) {
+ nodeReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+ _("Failed to create thread to suspend the host\n"));
+ goto cleanup;
+ }
+
+ /* virSuspendUnlock() must be called only after resume is complete,
+ * in the thread that did the suspend and resume.
+ */
+ return 0;
+
+cleanup:
+ virSuspendUnlock();
Memory leak - you don't call VIR_FREE(cmdString) on all paths.
+ return -1;
+}
+
+#define MAX_SUSPEND_DURATION 365*24*60*60 /* 1 year, a reasonable max? */
Any time you impose an arbitrary limit (and 1 year can be deemed
arbitrary), you are risking problems. There should be a real technical
reason why (and if) we have to impose a limit, not an arbitrary time
duration; otherwise, I would favor letting the user sleep as long as the
RTC clock supports (even if the sleep amount seems ridiculous to me).
This is as far as I got on my review today; I'll pick up here tomorrow.
--
Eric Blake eblake(a)redhat.com +1-919-301-3266
Libvirt virtualization library
http://libvirt.org