[libvirt] RFC [0/3]: A proposal for lock manager plugins

A few weeks back David introduce sync-manager as a means for protecting against the same VM being started on multiple hosts: https://www.redhat.com/archives/libvir-list/2010-August/msg00179.html This is obviously a desirable feature for libvirt, but we don't want to have a hardcoded dependancy on a particular implementation. We can achieve isolation between libvirt & sync-manager, allowing for alternative implementations by specifying a formal plugin for supervisor processes. What follows is my mockup of what a plugin API would look like, its internal libvirt API and an outline of the usage scenarios in psuedo code. At the minimum level of compliance, the plugin implementation provides for per-VM level exclusion. Optionally a plugin can declare that it - supports locks against individual resources (disks) - supports hotplug of individual resources - supports migration of the supervised process Migration/hotplug will be forbid if those capabilities aren't declared by the plugin. In parallel with David's work on sync-manager, I intend to write a simple plugin implementation based solely on fcntl() locking. It is important that we have 2 independant implementations to prove the validity of the plugin API. The fcntl() impl should provide a zero-config impl we can deploy out of the box that will provide protection against 2 vms using the same disk image on a host, and limited protection in multi-host scenarios with shared filesystems (no protection for shared block devs). Perhaps we should have a 3rd impl based on libdlm too, for Red Hat ClusterSuite scenarios, though sync-manager may well be general & simple enough to easily cope with this already. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

/** * virLockManager: * * A lock manager is a process that will supervise another * process. It will obtain & hold locks/leases for resources * the supervised process uses */ typedef struct virLockManager virLockManager; typedef virLockManager *virLockManagerPtr; typedef enum { VIR_LOCK_MANAGER_RESOURCE_READONLY = (1 << 0), VIR_LOCK_MANAGER_RESOURCE_SHARED = (1 << 1), } virLockManagerResourceFlags; enum { VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK = 0, } virLockManagerResourceType; enum { VIR_LOCK_MANAGER_NEW_DOMAIN = 0, } virLockManagerNewType; enum { VIR_LOCK_MANAGER_NEW_MIGRATE = (1 << 0), VIR_LOCK_MANAGER_NEW_ATTACH = (1 << 1), } virLockManagerNewFlags; enum { VIR_LOCK_MANAGER_MIGRATE_CANCEL = (1 << 0); } virLockManagerCompleteMigrateFlags; /** * virLockManagerNew: * @driver: the driver implementation to use * @type: the type of process to be supervised * @flags: one of virLockManagerNewFlags * * Create a new context to supervise a process, usually * a virtual machine. For a normal startup, flags can * be 0. For incoming migration, use VIR_LOCK_MANAGER_NEW_MIGRATE * * Returns a new lock manager context */ virLockManagerPtr virLockManagerNew(virLockManagerDriverPtr driver, unsigned int type, unsigned int flags); /** * virLockManagerSetParameter: * @manager: the lock manager context * @key: the parameter name * @value: the parameter value * * Set a configuration parameter for the managed process. * A process of VIR_LOCK_MANAGER_START_DOMAIN will be * given at least 3 parameters: * * - id: the domain unique id * - uuid: the domain uuid * - name: the domain name * * There may be other parameters specific to the lock manager * plugin that are provided for the managed process * * This should only be called prior to the supervised process * being started. Setting parameters may change the set of * argv returned by virLockManagerGetSupervisorArgs. */ int virLockManagerSetParameter(virLockManagerPtr manager, const char *key, const char *value); /** * virLockManagerRegisterResource: * @manager: the lock manager context * @type: the resource type virLockManagerResourceType * @name: the resource name * @flags: the resource access flags * * Register a resource that is required when the process * starts up. This may only be called prior to the process * being started. The format of @name varies according to * the resource @type. A VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK * will have a fully qualified file path. * * If no flags are given, the resource is assumed to be * used in exclusive, read-write mode. Access can be * relaxed to readonly, or shared read-write. */ int virLockManagerRegisterResource(virLockManagerPtr manager, unsigned int type, const gchar *name, unsigned int flags); /** * virLockManagerAcquireResource: * @manager: the lock manager context * @type: the resource type virLockManagerResourceType * @name: the resource name * @flags: the resource access flags * * Dynamically acquire a resource for a running process. * This may only be called once the process has been * started. The format of @name varies according to * the resource @type. A VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK * will have a fully qualified file path. * * If no flags are given, the resource is assumed to be * used in exclusive, read-write mode. Access can be * relaxed to readonly, or shared read-write. */ int virLockManagerAcquireResource(virLockManagerPtr manager, const gchar *file, unsigned int flags); /** * virLockManagerReleaseResource: * @manager: the lock manager context * @type: the resource type virLockManagerResourceType * @name: the resource name * @flags: the resource access flags * * Dynamically release a resource for a running process. * This may only be called after the process has been * started. The format of @name varies according to * the resource @type. A VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK * will have a fully qualified file path. * * If no flags are given, the resource is assumed to be * used in exclusive, read-write mode. Access can be * relaxed to readonly, or shared read-write. */ int virLockManagerReleaseResource(virLockManagerPtr manager, const gchar *file, unsigned int flags); /** * virLockManager: * @manager: the lock manager context * * Retrieve the path of the supervisor binary */ char *virLockManagerGetSupervisorPath(virLockManagerPtr manager); /** * virLockManager: * @manager: the lock manager context * * Retrieve the security context of the supervisor binary */ char *virLockManagerGetSupervisorContext(virLockManagerPtr manager); /** * virLockManager: * @manager: the lock manager context * @argv: returned argument values * @argc: returned argument count * * Retrieve custom argv to pass to the supervisor binary * ahead of the normal process binary & argv. It is recommended * that the last argv be '--' to allow reliable determination * of the last supervisor arg. */ int virLockManagerGetSupervisorArgs(virLockManagerPtr manager, char ***argv, int *argc, unsigned int flags); /** * virLockManagerPrepareMigrate: * @manager: the lock manager context * @targetURI: destination of the migration * * Notify the supevisor that the process is about to be migrated * to another host at @targetURI */ int virLockManagerPrepareMigrate(virLockManagerPtr manager, const char *targetURI, unsigned int flags); /** * virLockManagerCompleteMigrateIn: * @manager: the lock manager context * @sourceURI: the origin of the migration * * Notify the supervisor that the process has completed * migration. If the migration is aborted, then the @flags * should be VIR_LOCK_MANAGER_MIGRATE_CANCEL */ int virLockManagerCompleteMigrateIn(virLockManagerPtr manager, const char *sourceURI, unsigned int flags); /** * virLockManagerCompleteMigrateOut: * @manager: the lock manager context * @targetURI: the destination of the migration * * Notify the supervisor that the process has completed * migration. If the migration is aborted, then the @flags * should be VIR_LOCK_MANAGER_MIGRATE_CANCEL */ int virLockManagerCompleteMigrateOut(virLockManagerPtr manager, const char *targetURI, unsigned int flags); /** * virLockManagerGetChild: * @manager: the lock manager context * * Obtain the PID of the managed process */ int virLockManagerGetChild(virLockManagerPtr manager, pid_t *pid, unsigned int flags); /** * virLockManager: * @manager: the lock manager context * * Inform the supervisor that the supervised process has * been, or can be stopped. */ int virLockManagerShutdown(virLockManagerPtr manager, unsigned int flags); /** * virLockManager: * @manager: the lock manager context * * Release resources. If the supervised process is still * running, it will be killed with prejudice */ void virLockManagerFree(virLockManagerPtr manager); -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On 09/10/2010 10:00 AM, Daniel P. Berrange wrote:
/** * virLockManagerSetParameter: * @manager: the lock manager context * @key: the parameter name * @value: the parameter value * * Set a configuration parameter for the managed process. * A process of VIR_LOCK_MANAGER_START_DOMAIN will be * given at least 3 parameters: * * - id: the domain unique id * - uuid: the domain uuid * - name: the domain name * * There may be other parameters specific to the lock manager * plugin that are provided for the managed process * * This should only be called prior to the supervised process * being started. Setting parameters may change the set of * argv returned by virLockManagerGetSupervisorArgs.
Returns 0 on success, -1 on failure? I'm guessing this is general usage throughout this file, but should it be mentioned per function, or just once up front?
/** * virLockManagerRegisterResource: * @manager: the lock manager context * @type: the resource type virLockManagerResourceType * @name: the resource name * @flags: the resource access flags * * Register a resource that is required when the process * starts up. This may only be called prior to the process * being started. The format of @name varies according to * the resource @type. A VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK * will have a fully qualified file path. * * If no flags are given, the resource is assumed to be * used in exclusive, read-write mode. Access can be * relaxed to readonly, or shared read-write.
Returns 0 on success, -1 on failure?
/** * virLockManagerAcquireResource: * @manager: the lock manager context * @type: the resource type virLockManagerResourceType * @name: the resource name * @flags: the resource access flags * * Dynamically acquire a resource for a running process. * This may only be called once the process has been * started. The format of @name varies according to * the resource @type. A VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK * will have a fully qualified file path. * * If no flags are given, the resource is assumed to be * used in exclusive, read-write mode. Access can be * relaxed to readonly, or shared read-write.
Does this guarantee that lock is not obtained on failure? Should flags include VIR_LOCK_ACQUIRE_PROBE, which then allows the choice between blocking to wait for the lock (flags==0) or an immediate return of -2 if the lock is already owned by another process (flags==1)?
/** * virLockManagerReleaseResource: * @manager: the lock manager context * @type: the resource type virLockManagerResourceType * @name: the resource name * @flags: the resource access flags * * Dynamically release a resource for a running process. * This may only be called after the process has been * started. The format of @name varies according to * the resource @type. A VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK * will have a fully qualified file path. * * If no flags are given, the resource is assumed to be * used in exclusive, read-write mode. Access can be * relaxed to readonly, or shared read-write.
If this fails, is the resource state indeterminate, or is it still guaranteed to be locked?
/** * virLockManagerPrepareMigrate: * @manager: the lock manager context * @targetURI: destination of the migration * * Notify the supevisor that the process is about to be migrated
s/supevisor/supervisor/
/** * virLockManager: * @manager: the lock manager context * * Inform the supervisor that the supervised process has * been, or can be stopped.
Does this automatically release any locks held by the manager, or fail if the manager still owns locks?
/** * virLockManager: * @manager: the lock manager context * * Release resources. If the supervised process is still * running, it will be killed with prejudice
Does this first attempt to automatically release any locks held by the manager? -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On Fri, Sep 10, 2010 at 01:50:56PM -0600, Eric Blake wrote:
On 09/10/2010 10:00 AM, Daniel P. Berrange wrote:
/** * virLockManagerSetParameter: * @manager: the lock manager context * @key: the parameter name * @value: the parameter value * * Set a configuration parameter for the managed process. * A process of VIR_LOCK_MANAGER_START_DOMAIN will be * given at least 3 parameters: * * - id: the domain unique id * - uuid: the domain uuid * - name: the domain name * * There may be other parameters specific to the lock manager * plugin that are provided for the managed process * * This should only be called prior to the supervised process * being started. Setting parameters may change the set of * argv returned by virLockManagerGetSupervisorArgs.
Returns 0 on success, -1 on failure? I'm guessing this is general usage throughout this file, but should it be mentioned per function, or just once up front?
Yes, they're all folllowing that normal convention.
/** * virLockManagerAcquireResource: * @manager: the lock manager context * @type: the resource type virLockManagerResourceType * @name: the resource name * @flags: the resource access flags * * Dynamically acquire a resource for a running process. * This may only be called once the process has been * started. The format of @name varies according to * the resource @type. A VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK * will have a fully qualified file path. * * If no flags are given, the resource is assumed to be * used in exclusive, read-write mode. Access can be * relaxed to readonly, or shared read-write.
Does this guarantee that lock is not obtained on failure? Should flags include VIR_LOCK_ACQUIRE_PROBE, which then allows the choice between blocking to wait for the lock (flags==0) or an immediate return of -2 if the lock is already owned by another process (flags==1)?
I can't really think of a case when libvirt would need todo a non-blocking probe, so I left that out for simplicity. Do we need to guarentee that the lock is not obtained on failure ? I don't think that is neccessary to specify for safety, so I'd just leave it undefined.
/** * virLockManagerReleaseResource: * @manager: the lock manager context * @type: the resource type virLockManagerResourceType * @name: the resource name * @flags: the resource access flags * * Dynamically release a resource for a running process. * This may only be called after the process has been * started. The format of @name varies according to * the resource @type. A VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK * will have a fully qualified file path. * * If no flags are given, the resource is assumed to be * used in exclusive, read-write mode. Access can be * relaxed to readonly, or shared read-write.
If this fails, is the resource state indeterminate, or is it still guaranteed to be locked?
I think it is sufficient to leave it undefined, since that is still safe.
/** * virLockManager:
Opps, virLockManagerShutdown
* @manager: the lock manager context * * Inform the supervisor that the supervised process has * been, or can be stopped.
Does this automatically release any locks held by the manager, or fail if the manager still owns locks?
The intention is that all locks will be released, but as above this is not critical from a safety POV.
/** * virLockManager: * @manager: the lock manager context * * Release resources. If the supervised process is still * running, it will be killed with prejudice
Does this first attempt to automatically release any locks held by the manager?
I intend that it will call virLockManagerShutdown() so in that sense, yes. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On Fri, 2010-09-10 at 17:00 +0100, Daniel P. Berrange wrote:
/** * virLockManager: * * A lock manager is a process that will supervise another * process. It will obtain & hold locks/leases for resources * the supervised process uses */
typedef struct virLockManager virLockManager; typedef virLockManager *virLockManagerPtr;
typedef enum { VIR_LOCK_MANAGER_RESOURCE_READONLY = (1 << 0), VIR_LOCK_MANAGER_RESOURCE_SHARED = (1 << 1), } virLockManagerResourceFlags;
enum { VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK = 0, } virLockManagerResourceType;
enum { VIR_LOCK_MANAGER_NEW_DOMAIN = 0, } virLockManagerNewType;
enum { VIR_LOCK_MANAGER_NEW_MIGRATE = (1 << 0), VIR_LOCK_MANAGER_NEW_ATTACH = (1 << 1), } virLockManagerNewFlags;
enum { VIR_LOCK_MANAGER_MIGRATE_CANCEL = (1 << 0); } virLockManagerCompleteMigrateFlags;
/** * virLockManagerNew: * @driver: the driver implementation to use * @type: the type of process to be supervised * @flags: one of virLockManagerNewFlags * * Create a new context to supervise a process, usually * a virtual machine. For a normal startup, flags can * be 0. For incoming migration, use VIR_LOCK_MANAGER_NEW_MIGRATE * * Returns a new lock manager context */ virLockManagerPtr virLockManagerNew(virLockManagerDriverPtr driver, unsigned int type, unsigned int flags);
/** * virLockManagerSetParameter: * @manager: the lock manager context * @key: the parameter name * @value: the parameter value * * Set a configuration parameter for the managed process. * A process of VIR_LOCK_MANAGER_START_DOMAIN will be * given at least 3 parameters: * * - id: the domain unique id * - uuid: the domain uuid * - name: the domain name * * There may be other parameters specific to the lock manager * plugin that are provided for the managed process * * This should only be called prior to the supervised process * being started. Setting parameters may change the set of * argv returned by virLockManagerGetSupervisorArgs. */ int virLockManagerSetParameter(virLockManagerPtr manager, const char *key, const char *value);
/** * virLockManagerRegisterResource: * @manager: the lock manager context * @type: the resource type virLockManagerResourceType * @name: the resource name * @flags: the resource access flags * * Register a resource that is required when the process * starts up. This may only be called prior to the process * being started. The format of @name varies according to * the resource @type. A VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK * will have a fully qualified file path. * * If no flags are given, the resource is assumed to be * used in exclusive, read-write mode. Access can be * relaxed to readonly, or shared read-write. */ int virLockManagerRegisterResource(virLockManagerPtr manager, unsigned int type, const gchar *name, unsigned int flags);
/** * virLockManagerAcquireResource: * @manager: the lock manager context * @type: the resource type virLockManagerResourceType * @name: the resource name * @flags: the resource access flags * * Dynamically acquire a resource for a running process. * This may only be called once the process has been * started. The format of @name varies according to * the resource @type. A VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK * will have a fully qualified file path. * * If no flags are given, the resource is assumed to be * used in exclusive, read-write mode. Access can be * relaxed to readonly, or shared read-write. */ int virLockManagerAcquireResource(virLockManagerPtr manager, const gchar *file, unsigned int flags);
/** * virLockManagerReleaseResource: * @manager: the lock manager context * @type: the resource type virLockManagerResourceType * @name: the resource name * @flags: the resource access flags * * Dynamically release a resource for a running process. * This may only be called after the process has been * started. The format of @name varies according to * the resource @type. A VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK * will have a fully qualified file path. * * If no flags are given, the resource is assumed to be * used in exclusive, read-write mode. Access can be * relaxed to readonly, or shared read-write. */ int virLockManagerReleaseResource(virLockManagerPtr manager, const gchar *file, unsigned int flags);
/** * virLockManager: * @manager: the lock manager context * * Retrieve the path of the supervisor binary */ char *virLockManagerGetSupervisorPath(virLockManagerPtr manager);
/** * virLockManager: * @manager: the lock manager context * * Retrieve the security context of the supervisor binary */ char *virLockManagerGetSupervisorContext(virLockManagerPtr manager);
/** * virLockManager: * @manager: the lock manager context * @argv: returned argument values * @argc: returned argument count * * Retrieve custom argv to pass to the supervisor binary * ahead of the normal process binary & argv. It is recommended * that the last argv be '--' to allow reliable determination * of the last supervisor arg. */ int virLockManagerGetSupervisorArgs(virLockManagerPtr manager, char ***argv, int *argc, unsigned int flags);
/** * virLockManagerPrepareMigrate: * @manager: the lock manager context * @targetURI: destination of the migration * * Notify the supevisor that the process is about to be migrated * to another host at @targetURI */ [SM] I don't think migration is a topic that should be managed in the lock level. Further more in sync_manager's case there are situation where you cannot perform a clean handover (like when the source host looses connection to the storage and we are migrating to another host to solve this). As I stated in my previous e-mail you think that lock handover is a needlessly complex use case to have a special algorithm for in sync_manager. Just releasing in the source side and reacquiring on the target (and making sure no one got in the middle) before starting the migration is simpler. You could put it in a special verb for handovers that implements this logic but I don't think it should be in the lock API but rather in a higher API level int virLockManagerPrepareMigrate(virLockManagerPtr manager, const char *targetURI, unsigned int flags); /** * virLockManagerCompleteMigrateIn: * @manager: the lock manager context * @sourceURI: the origin of the migration * * Notify the supervisor that the process has completed * migration. If the migration is aborted, then the @flags * should be VIR_LOCK_MANAGER_MIGRATE_CANCEL */ int virLockManagerCompleteMigrateIn(virLockManagerPtr manager, const char *sourceURI, unsigned int flags); /** * virLockManagerCompleteMigrateOut: * @manager: the lock manager context * @targetURI: the destination of the migration * * Notify the supervisor that the process has completed * migration. If the migration is aborted, then the @flags * should be VIR_LOCK_MANAGER_MIGRATE_CANCEL */ int virLockManagerCompleteMigrateOut(virLockManagerPtr manager, const char *targetURI, unsigned int flags);
/** * virLockManagerGetChild: * @manager: the lock manager context * * Obtain the PID of the managed process */ int virLockManagerGetChild(virLockManagerPtr manager, pid_t *pid, unsigned int flags);
/** * virLockManager: * @manager: the lock manager context * * Inform the supervisor that the supervised process has * been, or can be stopped. */ int virLockManagerShutdown(virLockManagerPtr manager, unsigned int flags);
/** * virLockManager: * @manager: the lock manager context * * Release resources. If the supervised process is still * running, it will be killed with prejudice */ void virLockManagerFree(virLockManagerPtr manager);

On Sun, Sep 12, 2010 at 02:22:04PM +0200, Saggi Mizrahi wrote:
On Fri, 2010-09-10 at 17:00 +0100, Daniel P. Berrange wrote:
enum { VIR_LOCK_MANAGER_NEW_MIGRATE = (1 << 0), VIR_LOCK_MANAGER_NEW_ATTACH = (1 << 1), } virLockManagerNewFlags;
enum { VIR_LOCK_MANAGER_MIGRATE_CANCEL = (1 << 0); } virLockManagerCompleteMigrateFlags;
/** * virLockManagerNew: * @driver: the driver implementation to use * @type: the type of process to be supervised * @flags: one of virLockManagerNewFlags * * Create a new context to supervise a process, usually * a virtual machine. For a normal startup, flags can * be 0. For incoming migration, use VIR_LOCK_MANAGER_NEW_MIGRATE * * Returns a new lock manager context */ virLockManagerPtr virLockManagerNew(virLockManagerDriverPtr driver, unsigned int type, unsigned int flags);
/** * virLockManagerPrepareMigrate: * @manager: the lock manager context * @targetURI: destination of the migration * * Notify the supevisor that the process is about to be migrated * to another host at @targetURI */
[SM] I don't think migration is a topic that should be managed in the lock level. Further more in sync_manager's case there are situation where you cannot perform a clean handover (like when the source host looses connection to the storage and we are migrating to another host to solve this). As I stated in my previous e-mail you think that lock handover is a needlessly complex use case to have a special algorithm for in sync_manager. Just releasing in the source side and reacquiring on the target (and making sure no one got in the middle) before starting the migration is simpler. You could put it in a special verb for handovers that implements this logic but I don't think it should be in the lock API but rather in a higher API level
You still have to know at which point to release the lock on the source side & which point to re-acquire it on the destination. There are two choices there a. Release before migration starts + acquire when QEMU -incoming process starts b. Release after migration completes + acquire when QEMU resumes CPUs I could rename the driver APIs here so they didn't include the word 'migrate', but I don't see how we can remove any of these APIs - there are potential actions that a lock/lease manger may want to perform at each of these points in migration. The goal with the driver API is to try not to restrict the possible implementation strategies that a lock manager can use. Encoding a special verb for handover seems overly restrictive to me, since it is forcing a particular implementation strategy. Regards, Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

Looks fine on principles, but there is some polishing to be provided, I especially think that we need to carefully examine all failure scenarios and document the expected behaviour as part of the API. After all it's an API to improve garantee of correct execution so we need to be fairly precise there. On Fri, Sep 10, 2010 at 05:00:07PM +0100, Daniel P. Berrange wrote:
/** * virLockManager: * * A lock manager is a process that will supervise another * process. It will obtain & hold locks/leases for resources * the supervised process uses */
typedef struct virLockManager virLockManager; typedef virLockManager *virLockManagerPtr;
typedef enum { VIR_LOCK_MANAGER_RESOURCE_READONLY = (1 << 0), VIR_LOCK_MANAGER_RESOURCE_SHARED = (1 << 1), } virLockManagerResourceFlags;
enum { VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK = 0, } virLockManagerResourceType;
enum { VIR_LOCK_MANAGER_NEW_DOMAIN = 0, } virLockManagerNewType;
enum { VIR_LOCK_MANAGER_NEW_MIGRATE = (1 << 0), VIR_LOCK_MANAGER_NEW_ATTACH = (1 << 1), } virLockManagerNewFlags;
enum { VIR_LOCK_MANAGER_MIGRATE_CANCEL = (1 << 0); } virLockManagerCompleteMigrateFlags;
/** * virLockManagerNew: * @driver: the driver implementation to use * @type: the type of process to be supervised * @flags: one of virLockManagerNewFlags * * Create a new context to supervise a process, usually * a virtual machine. For a normal startup, flags can * be 0. For incoming migration, use VIR_LOCK_MANAGER_NEW_MIGRATE * * Returns a new lock manager context
or NULL in case of failure
*/ virLockManagerPtr virLockManagerNew(virLockManagerDriverPtr driver, unsigned int type, unsigned int flags);
What do you envision as being other type use case than process used for running a domain ?
/** * virLockManagerSetParameter: * @manager: the lock manager context * @key: the parameter name * @value: the parameter value * * Set a configuration parameter for the managed process. * A process of VIR_LOCK_MANAGER_START_DOMAIN will be * given at least 3 parameters: * * - id: the domain unique id * - uuid: the domain uuid * - name: the domain name * * There may be other parameters specific to the lock manager * plugin that are provided for the managed process * * This should only be called prior to the supervised process * being started. Setting parameters may change the set of * argv returned by virLockManagerGetSupervisorArgs. */ int virLockManagerSetParameter(virLockManagerPtr manager, const char *key, const char *value);
Hum ... this API feels a bit weak, I guess by the attempt of being flexible. I think we should define precisely the strings to be passed as key I because being an internal API I don't see any point in keeping floating keys values. If we know the name of the keys used by the drivers, the drivers being internal, then I would prefer use an enum for key. The error handling need to be precised, I think there is 3 cases: - success - an hard error, like an allocation error - the lock manager doesn't know/use the key we could explicitely merge the last one with success, but this should be described, to be sure it's the equivalent of a no-op then.
/** * virLockManagerRegisterResource: * @manager: the lock manager context * @type: the resource type virLockManagerResourceType * @name: the resource name * @flags: the resource access flags * * Register a resource that is required when the process * starts up. This may only be called prior to the process * being started. The format of @name varies according to * the resource @type. A VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK * will have a fully qualified file path. * * If no flags are given, the resource is assumed to be * used in exclusive, read-write mode. Access can be * relaxed to readonly, or shared read-write. */ int virLockManagerRegisterResource(virLockManagerPtr manager, unsigned int type, const gchar *name,
why gchar here instead of char ?
unsigned int flags);
I think failure mode need to be made clear here. Basically an error should mean that the domain creation aborts, and it's basically an internal error, since we don't expect the lock manager to take the lock just now (that should be precised, would the lock manager be allowed to try to grab the resource immediately ?) Error could come from parameter being invalid (an unrecognized flag for example), an internal allocation error, or calling at the wrong time. It think in all cases that mean we should abort creation.
/** * virLockManagerAcquireResource: * @manager: the lock manager context * @type: the resource type virLockManagerResourceType * @name: the resource name * @flags: the resource access flags * * Dynamically acquire a resource for a running process. * This may only be called once the process has been * started. The format of @name varies according to * the resource @type. A VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK * will have a fully qualified file path. * * If no flags are given, the resource is assumed to be * used in exclusive, read-write mode. Access can be * relaxed to readonly, or shared read-write. */ int virLockManagerAcquireResource(virLockManagerPtr manager, const gchar *file,
why gchar here instead of char ?
unsigned int flags);
Again failure mode need to be clarified. I see 4 kind of errors possible: - parameter error - internal error (allocation for example) - operation being unsupported by the lock manager - failure to acquire the resource It think in all cases from a domain POV it's equivalent to a no-op and hence non-fatal for the domain, but this should be clarified.
/** * virLockManagerReleaseResource: * @manager: the lock manager context * @type: the resource type virLockManagerResourceType * @name: the resource name * @flags: the resource access flags * * Dynamically release a resource for a running process. * This may only be called after the process has been * started. The format of @name varies according to * the resource @type. A VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK * will have a fully qualified file path. * * If no flags are given, the resource is assumed to be * used in exclusive, read-write mode. Access can be * relaxed to readonly, or shared read-write. */ int virLockManagerReleaseResource(virLockManagerPtr manager, const gchar *file,
why gchar here instead of char ?
unsigned int flags);
/** * virLockManager:
GetSupervisorPath
* @manager: the lock manager context * * Retrieve the path of the supervisor binary */ char *virLockManagerGetSupervisorPath(virLockManagerPtr manager);
what if the lock manager is in-libvirtd ? Return NULL ? We have a possibility of memory allocation error so that's another failure possible.
/** * virLockManager: * @manager: the lock manager context * * Retrieve the security context of the supervisor binary */ char *virLockManagerGetSupervisorContext(virLockManagerPtr manager);
/** * virLockManager: * @manager: the lock manager context * @argv: returned argument values * @argc: returned argument count
* @flags: unused yet use 0
* * Retrieve custom argv to pass to the supervisor binary * ahead of the normal process binary & argv. It is recommended * that the last argv be '--' to allow reliable determination * of the last supervisor arg. */ int virLockManagerGetSupervisorArgs(virLockManagerPtr manager, char ***argv, int *argc, unsigned int flags);
/** * virLockManagerPrepareMigrate: * @manager: the lock manager context * @targetURI: destination of the migration * * Notify the supevisor that the process is about to be migrated * to another host at @targetURI */ int virLockManagerPrepareMigrate(virLockManagerPtr manager, const char *targetURI, unsigned int flags); /** * virLockManagerCompleteMigrateIn: * @manager: the lock manager context * @sourceURI: the origin of the migration
* @flags: combination of virLockManagerCompleteMigrateFlags
* * Notify the supervisor that the process has completed * migration. If the migration is aborted, then the @flags * should be VIR_LOCK_MANAGER_MIGRATE_CANCEL */ int virLockManagerCompleteMigrateIn(virLockManagerPtr manager, const char *sourceURI, unsigned int flags);
We should specify that this is to be called only on the migration target node, right? What is the error handling, what would failure indicate there ?
/** * virLockManagerCompleteMigrateOut: * @manager: the lock manager context * @targetURI: the destination of the migration
* @flags: combination of virLockManagerCompleteMigrateFlags
* * Notify the supervisor that the process has completed * migration. If the migration is aborted, then the @flags * should be VIR_LOCK_MANAGER_MIGRATE_CANCEL */ int virLockManagerCompleteMigrateOut(virLockManagerPtr manager, const char *targetURI, unsigned int flags);
We should specify that this is to be called only on the migration source node, right? What is the error handling, what would failure indicate there ?
/** * virLockManagerGetChild: * @manager: the lock manager context
* @flags: unused yet use 0
* * Obtain the PID of the managed process */ int virLockManagerGetChild(virLockManagerPtr manager, pid_t *pid, unsigned int flags);
/** * virLockManager:
Shutdown
* @manager: the lock manager context
* @flags: unused yet use 0
* * Inform the supervisor that the supervised process has * been, or can be stopped. */ int virLockManagerShutdown(virLockManagerPtr manager, unsigned int flags);
Except errors in parameter do we envision any failure ? It seems to me this could be called multiple time in a row without prejudice, if that's the case then we should document it
/** * virLockManager:
Free
* @manager: the lock manager context * * Release resources. If the supervised process is still * running, it will be killed with prejudice */ void virLockManagerFree(virLockManagerPtr manager);
same as for virLockManagerShutdown what would be the failure mode. But the most important point I guess is does the lock manager provide a way to tell it's done with a guest. We could have scenarios where the guest disapears but the lock manager is hung for example trying to contact a disapeared network device. The domain being down doesn't mean the lock manager is. As said I think the API in general is a nice abstraction for what we need but the devil really is in the detail of the operation model, and that must include the error cases. There many ways something may go wrong but we should have relaible expectation about how this is handled and reported across multiple implementations, Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/

typedef struct _virLockManagerDriver virLockManagerDriver; typedef virLockManagerDriver *virLockManagerDriverPtr; /* Which callbacks are supported */ typedef enum { VIR_LOCK_MANAGER_DRIVER_RESOURCES = (1 << 0), VIR_LOCK_MANAGER_DRIVER_MIGRATE = (1 << 0), VIR_LOCK_MANAGER_DRIVER_HOTPLUG = (1 << 1), }; struct _virLockManagerDriver { /** * @version: the newest implemented plugin ABI version * @flags: the implemented plugin optional extras */ unsigned int version; unsigned int flags; /** * load_drv: * @version: the libvirt requested plugin ABI version * @flags: the libvirt requested plugin optional extras * * Allow the plugin to validate the libvirt requested * plugin version / flags. This allows it to reject * use of versions which are too old. A plugin may * be loaded multiple times, for different libvirt * drivers * * Returns -1 if the requested version/flags were inadequate */ int (*load_drv)(unsigned int version, unsigned int flags); /** * unload_drv: * * Called to release any resources prior to the plugin * being unloaded from memory. Returns -1 to prevent * plugin unloading. */ int (*load_drv)(void); /* Mandatory callbacks */ virLockManagerPtr (*new_drv)(unsigned int type, unsigned int flags); int (*set_parameter_drv)(virLockManagerPtr man, const char *key, const char *value); char *(*get_supervisor_path_drv)(virLockManagerPtr man); char *(*get_supervisor_context_drv)(virLockManagerPtr man); int (*get_supervisor_args_drv)(virLockManagerPtr man, char ***argv, int *argc, unsigned int flags); int (*connect_drv)(virLockManagerPtr man, unsigned int flags); int (*startup_drv)(virLockManagerPtr man, pid_t *pid, unsigned int flags); int (*shutdown_drv)(virLockManagerPtr man, unsigned int flags); void (*free_drv)(virLockManagerPtr man); /* End of mandatory callbacks */ /* Only if VIR_LOCK_MANAGER_DRIVER_RESOURCES */ int (*set_file_drv)(virLockManagerPtr man, const char *file, unsigned int flags); /* Only if VIR_LOCK_MANAGER_DRIVER_HOTPLUG */ int (*add_file_drv)(virLockManagerPtr man, const char *file, unsigned int flags); int (*remove_file_drv)(virLockManagerPtr man, const char *file, unsigned int flags); /* Only if VIR_LOCK_MANAGER_DRIVER_MIGRATE */ int (*prepare_migrate_drv)(virLockManagerPtr man, const gchar *targetURI, unsigned int flags); int (*complete_migrate_out_drv)(virLockManagerPtr man, const gchar *targetURI, unsigned int flags); int (*complete_migrate_in_drv)(virLockManagerPtr man, const gchar *sourceURI, unsigned int flags); }; /** * virLockManagerPluginDriver: * * The signature of the function for obtaining a lock manager * driver from a dlopen()d plugin module. The plugin must contain * a symbol called "plugin_init" with this signature */ typedef virLockManagerDriverPtr (*virLockManagerPluginDriver)(void); /** * virLockManagerPluginLoad: * @name: the name of the plugin * * Attempt to load the plugin $(libdir)/libvirt/lock-manager/@name.so * The plugin driver entry point will be resolved & invoked to obtain * the lock manager driver */ virLockManagerPluginPtr virLockManagerPluginLoad(const char *name); int virLockManagerPluginUnload(virLockManagerPluginPtr); -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On 09/10/2010 10:00 AM, Daniel P. Berrange wrote:
typedef struct _virLockManagerDriver virLockManagerDriver; typedef virLockManagerDriver *virLockManagerDriverPtr;
/* Which callbacks are supported */ typedef enum { VIR_LOCK_MANAGER_DRIVER_RESOURCES = (1<< 0), VIR_LOCK_MANAGER_DRIVER_MIGRATE = (1<< 0),
Same value?
VIR_LOCK_MANAGER_DRIVER_HOTPLUG = (1<< 1), };
struct _virLockManagerDriver { /** * @version: the newest implemented plugin ABI version * @flags: the implemented plugin optional extras */ unsigned int version; unsigned int flags;
/** * load_drv: * @version: the libvirt requested plugin ABI version * @flags: the libvirt requested plugin optional extras * * Allow the plugin to validate the libvirt requested * plugin version / flags. This allows it to reject * use of versions which are too old. A plugin may * be loaded multiple times, for different libvirt * drivers * * Returns -1 if the requested version/flags were inadequate */ int (*load_drv)(unsigned int version, unsigned int flags);
/** * unload_drv: * * Called to release any resources prior to the plugin * being unloaded from memory. Returns -1 to prevent * plugin unloading. */ int (*load_drv)(void);
s/load_drv/unload_drv/ or this won't compile. Are these two callbacks mandatory, or can they be NULL in the case that the driver only supports exactly what it recorded in version/flags?
/** * virLockManagerPluginLoad: * @name: the name of the plugin * * Attempt to load the plugin $(libdir)/libvirt/lock-manager/@name.so * The plugin driver entry point will be resolved& invoked to obtain * the lock manager driver
Returns NULL on failure, but is there any way to distinguish between levels of failure (file not found vs. plugin_init failed vs. load_drv callback failed), since that might be useful in diagnosing the problem?
*/ virLockManagerPluginPtr virLockManagerPluginLoad(const char *name);
No documentation for virLockMAnagerPluginUnload?
int virLockManagerPluginUnload(virLockManagerPluginPtr);
-- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On Fri, Sep 10, 2010 at 02:01:16PM -0600, Eric Blake wrote:
On 09/10/2010 10:00 AM, Daniel P. Berrange wrote:
typedef struct _virLockManagerDriver virLockManagerDriver; typedef virLockManagerDriver *virLockManagerDriverPtr;
/* Which callbacks are supported */ typedef enum { VIR_LOCK_MANAGER_DRIVER_RESOURCES = (1<< 0), VIR_LOCK_MANAGER_DRIVER_MIGRATE = (1<< 0),
Same value?
VIR_LOCK_MANAGER_DRIVER_HOTPLUG = (1<< 1), };
struct _virLockManagerDriver { /** * @version: the newest implemented plugin ABI version * @flags: the implemented plugin optional extras */ unsigned int version; unsigned int flags;
/** * load_drv: * @version: the libvirt requested plugin ABI version * @flags: the libvirt requested plugin optional extras * * Allow the plugin to validate the libvirt requested * plugin version / flags. This allows it to reject * use of versions which are too old. A plugin may * be loaded multiple times, for different libvirt * drivers * * Returns -1 if the requested version/flags were inadequate */ int (*load_drv)(unsigned int version, unsigned int flags);
/** * unload_drv: * * Called to release any resources prior to the plugin * being unloaded from memory. Returns -1 to prevent * plugin unloading. */ int (*load_drv)(void);
s/load_drv/unload_drv/ or this won't compile. Are these two callbacks mandatory, or can they be NULL in the case that the driver only supports exactly what it recorded in version/flags?
/** * virLockManagerPluginLoad: * @name: the name of the plugin * * Attempt to load the plugin $(libdir)/libvirt/lock-manager/@name.so * The plugin driver entry point will be resolved& invoked to obtain * the lock manager driver
Returns NULL on failure, but is there any way to distinguish between levels of failure (file not found vs. plugin_init failed vs. load_drv callback failed), since that might be useful in diagnosing the problem?
We'll report errors via the normal libvirt error & logging APIs so that should still be diagnosable. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

[Resurrecting this old thread which has just been brought to my attention] On Fri, Sep 10, 2010 at 05:00:52PM +0100, Daniel P. Berrange wrote:
typedef struct _virLockManagerDriver virLockManagerDriver; typedef virLockManagerDriver *virLockManagerDriverPtr;
/* Which callbacks are supported */ typedef enum { VIR_LOCK_MANAGER_DRIVER_RESOURCES = (1 << 0), VIR_LOCK_MANAGER_DRIVER_MIGRATE = (1 << 0), VIR_LOCK_MANAGER_DRIVER_HOTPLUG = (1 << 1), };
struct _virLockManagerDriver { /** * @version: the newest implemented plugin ABI version * @flags: the implemented plugin optional extras */ unsigned int version; unsigned int flags;
/** * load_drv: * @version: the libvirt requested plugin ABI version * @flags: the libvirt requested plugin optional extras * * Allow the plugin to validate the libvirt requested * plugin version / flags. This allows it to reject * use of versions which are too old. A plugin may * be loaded multiple times, for different libvirt * drivers * * Returns -1 if the requested version/flags were inadequate */ int (*load_drv)(unsigned int version, unsigned int flags);
/** * unload_drv: * * Called to release any resources prior to the plugin * being unloaded from memory. Returns -1 to prevent * plugin unloading. */ int (*[un]load_drv)(void);
There are various user-unfriendly aspects to this: (1) If load_drv fails, you've left me up the creek. What can I as a caller do? Tell the user to upgrade their software? Carry on regardless? (2) Without compiler help it is hard to call load_drv/unload_drv at the right time in a thread-safe way. (3) Having all the functions buried inside a structure means I cannot use ordinary techniques (ie. dlsym) to find out if a function is supported. Because I cannot use dlsym, I'm forced to use hard-coded ABI checks! I think this is better done using the techniques that ELF or any self-respecting dynamic loader already provides: Use __attribute__((constructor)) and __attribute__((destructor)) to automatically and safely start and stop the lock manager (solves (2)). Guarantee a minimum ABI in the first version (solves (1)). Provide top-level functions in the lockmgr.so which correspond to functions I can call directly, ie: virLockManagerDoFoo (virLockManagerPtr *impl, int foo_arg); and let me find out if they are supported by calling dlsym (solves (1) & (3)). If you need to change an API, add another function: virLockManagerDoFoo2 (virLockManagerPtr *impl, int foo_arg, int oops_we_forgot_this_arg); As for the specifics, where is the lock manager DSO going to sit in the filesystem? Thanks for explaining the specifics of this on IRC. It currently looks like this: impl->load_drv(LIBVIRT_LOCK_VERSION, 0); impl->new_drv(VIR_LOCK_MANAGER_OBJECT_TYPE_DOMAIN, 0); impl->set_parameter_drv("id", 2); impl->set_parameter_drv("name", "foo-guest'); impl->set_parameter_drv("uuid", "c7a2edbd-edaf-9455-926a-d65c16db1809"); impl->acquire_resource(VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK, "/some/big/disk/path", 0); impl->acquire_resource(VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK, "/some/big/iso/file/path", VIR_LOCK_MANAGER_RESOURCE_READONLY); if (!(impl->startup_drv()) ...cannot obtain lock, so bail out... ...run QEMU... impl->shutdown_drv The thing here is that I have to translate the XML into something that the lock manager wants. Why not let me just pass the XML? Or the virDomainPtr? Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones New in Fedora 11: Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 70 libraries supprt'd http://fedoraproject.org/wiki/MinGW http://www.annexia.org/fedora_mingw

On Wed, Nov 10, 2010 at 03:01:01PM +0000, Richard W.M. Jones wrote:
[Resurrecting this old thread which has just been brought to my attention]
On Fri, Sep 10, 2010 at 05:00:52PM +0100, Daniel P. Berrange wrote:
typedef struct _virLockManagerDriver virLockManagerDriver; typedef virLockManagerDriver *virLockManagerDriverPtr;
/* Which callbacks are supported */ typedef enum { VIR_LOCK_MANAGER_DRIVER_RESOURCES = (1 << 0), VIR_LOCK_MANAGER_DRIVER_MIGRATE = (1 << 0), VIR_LOCK_MANAGER_DRIVER_HOTPLUG = (1 << 1), };
struct _virLockManagerDriver { /** * @version: the newest implemented plugin ABI version * @flags: the implemented plugin optional extras */ unsigned int version; unsigned int flags;
/** * load_drv: * @version: the libvirt requested plugin ABI version * @flags: the libvirt requested plugin optional extras * * Allow the plugin to validate the libvirt requested * plugin version / flags. This allows it to reject * use of versions which are too old. A plugin may * be loaded multiple times, for different libvirt * drivers * * Returns -1 if the requested version/flags were inadequate */ int (*load_drv)(unsigned int version, unsigned int flags);
/** * unload_drv: * * Called to release any resources prior to the plugin * being unloaded from memory. Returns -1 to prevent * plugin unloading. */ int (*[un]load_drv)(void);
There are various user-unfriendly aspects to this:
(1) If load_drv fails, you've left me up the creek. What can I as a caller do? Tell the user to upgrade their software? Carry on regardless?
The load_drv() method there is to allow the plugin implementation to block its use with a libvirtd that is too old. eg, say we add a new API 'foo', and the plugin implements that API. It wants to block its use in any libvirt which doesn't know about the 'foo' entry point, because such use could be dangerous to safety. The load_drv() method allows that to be handled. If load_drv() fails libvirt/libguestfs *must* refuse to run any VMs.
(2) Without compiler help it is hard to call load_drv/unload_drv at the right time in a thread-safe way.
In libvirt context they are loaded when the QEMU driver is initialized in libvirtd which is still single threaded. In libguestfs, you've no global "driver" state, so I'd suggest it is easiest to just call load_drv before starting the VM, and unload_drv after stopping it. The API contract requires that a plugin needs to cope with being loaded multiple times.
(3) Having all the functions buried inside a structure means I cannot use ordinary techniques (ie. dlsym) to find out if a function is supported. Because I cannot use dlsym, I'm forced to use hard-coded ABI checks!
You're thinking only about libvirtd checking what a plugin supports. The load_drv() method is about the *opposite*, allowing the plugin to check what libvirtd supports. You can't do that with dlsym, or __attribute__().
I think this is better done using the techniques that ELF or any self-respecting dynamic loader already provides:
Use __attribute__((constructor)) and __attribute__((destructor)) to automatically and safely start and stop the lock manager (solves (2)).
Guarantee a minimum ABI in the first version (solves (1)).
In addition to the reasons already mentioned for constructor not being able to replace load_drv, "destructor" can't replace the unload_drv method. The unload_drv method is there to allow the plugin to refuse to allow module unloading from the app. The load/unload API entry points aren't really intended for one time module state initialization. They are for doing compatibility and safety checks, which require parameters & return values that you can't do with constructors. If a plugin does want todo global state initialization then it can of course use __attribute__((constructor)) and __attribute__((destructor)), and leave unload_drv/load_Drv to be no-op
Provide top-level functions in the lockmgr.so which correspond to functions I can call directly, ie:
virLockManagerDoFoo (virLockManagerPtr *impl, int foo_arg);
and let me find out if they are supported by calling dlsym (solves (1) & (3)).
If you need to change an API, add another function:
virLockManagerDoFoo2 (virLockManagerPtr *impl, int foo_arg, int oops_we_forgot_this_arg);
As for the specifics, where is the lock manager DSO going to sit in the filesystem?
Not locked down yet, but likely $LIBDIR/libvirt/plugins/lock_manager/$NAME.so
Thanks for explaining the specifics of this on IRC. It currently looks like this:
impl->load_drv(LIBVIRT_LOCK_VERSION, 0);
impl->new_drv(VIR_LOCK_MANAGER_OBJECT_TYPE_DOMAIN, 0);
impl->set_parameter_drv("id", 2); impl->set_parameter_drv("name", "foo-guest'); impl->set_parameter_drv("uuid", "c7a2edbd-edaf-9455-926a-d65c16db1809");
impl->acquire_resource(VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK, "/some/big/disk/path", 0);
impl->acquire_resource(VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK, "/some/big/iso/file/path", VIR_LOCK_MANAGER_RESOURCE_READONLY);
if (!(impl->startup_drv()) ...cannot obtain lock, so bail out...
...run QEMU...
impl->shutdown_drv
The thing here is that I have to translate the XML into something that the lock manager wants. Why not let me just pass the XML? Or the virDomainPtr?
At the point where libvirt drivers invoke the plugin, there is no virDomainPtr in available, only the internal objects virDomainDefPtr which aren't something we want to expose since they're not ABI stable. In addition from the context where the plugin executes, it likely won't be able to use the libvirt APIs safely without causing deadlock in libvirt drivers. It isn't passing the XML because we didn't want the lock manager implementations to all have to re-implement XML parsing, and again the libvirt driver code doesn't have the XML around at this point. Regards, Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On Wed, Nov 10, 2010 at 03:52:49PM +0000, Daniel P. Berrange wrote:
The load_drv() method there is to allow the plugin implementation to block its use with a libvirtd that is too old. eg, say we add a new API 'foo', and the plugin implements that API. It wants to block its use in any libvirt which doesn't know about the 'foo'
What sort of thing could 'foo' be?
entry point, because such use could be dangerous to safety.
You should just rename the lock manager .so file in this case. But it'd be best to write an enduring minimum API and ABI now to avoid having to do this.
The load_drv() method allows that to be handled. If load_drv() fails libvirt/libguestfs *must* refuse to run any VMs.
Well, libguestfs has other alternatives. Ignoring the lock manager is one, allowing read-only calls to proceed is another one.
(2) Without compiler help it is hard to call load_drv/unload_drv at the right time in a thread-safe way.
In libvirt context they are loaded when the QEMU driver is initialized in libvirtd which is still single threaded.
In libguestfs, you've no global "driver" state, so I'd suggest it is easiest to just call load_drv before starting the VM, and unload_drv after stopping it. The API contract requires that a plugin needs to cope with being loaded multiple times.
There are still issues with this. Multiple threads can call guestfs_launch (on different handles), resulting in reentrant calls to load_drv. I'm not convinced that the API as proposed is safe against this sort of thing: http://lists.fedoraproject.org/pipermail/devel/2010-January/129117.html
(3) Having all the functions buried inside a structure means I cannot use ordinary techniques (ie. dlsym) to find out if a function is supported. Because I cannot use dlsym, I'm forced to use hard-coded ABI checks!
You're thinking only about libvirtd checking what a plugin supports. The load_drv() method is about the *opposite*, allowing the plugin to check what libvirtd supports. You can't do that with dlsym, or __attribute__().
You can by renaming the lock manager DSO. Still, better to avoid having to do that by having at least a minimum stable ABI which you support forever.
In addition to the reasons already mentioned for constructor not being able to replace load_drv, "destructor" can't replace the unload_drv method. The unload_drv method is there to allow the plugin to refuse to allow module unloading from the app.
Why not just pause in the resource release function (shutdown_drv is it?) I'm still unclear what sort of locking would allow you to release the resource but still be unable to unload the module.
The thing here is that I have to translate the XML into something that the lock manager wants. Why not let me just pass the XML? Or the virDomainPtr?
At the point where libvirt drivers invoke the plugin, there is no virDomainPtr in available, only the internal objects virDomainDefPtr which aren't something we want to expose since they're not ABI stable. In addition from the context where the plugin executes, it likely won't be able to use the libvirt APIs safely without causing deadlock in libvirt drivers.
It isn't passing the XML because we didn't want the lock manager implementations to all have to re-implement XML parsing, and again the libvirt driver code doesn't have the XML around at this point.
This is unfriendly to other callers ... Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://et.redhat.com/~rjones/virt-df/

On Wed, Nov 10, 2010 at 04:17:13PM +0000, Richard W.M. Jones wrote:
On Wed, Nov 10, 2010 at 03:52:49PM +0000, Daniel P. Berrange wrote:
The load_drv() method there is to allow the plugin implementation to block its use with a libvirtd that is too old. eg, say we add a new API 'foo', and the plugin implements that API. It wants to block its use in any libvirt which doesn't know about the 'foo'
What sort of thing could 'foo' be?
entry point, because such use could be dangerous to safety.
You should just rename the lock manager .so file in this case. But it'd be best to write an enduring minimum API and ABI now to avoid having to do this.
The load_drv() method allows that to be handled. If load_drv() fails libvirt/libguestfs *must* refuse to run any VMs.
Well, libguestfs has other alternatives. Ignoring the lock manager is one, allowing read-only calls to proceed is another one.
Allowing read-only calls to continue isn't guarenteed to be safe. The lock manager plugin may wish to deny even readonly usage, to ensure that there aren't data consistency complication with NFS close-to-open behaviour during migration. If a lock manager is configured for use and it fails, not running any VMs is the only safe thing, until the admin has fixed the config problem.
(2) Without compiler help it is hard to call load_drv/unload_drv at the right time in a thread-safe way.
In libvirt context they are loaded when the QEMU driver is initialized in libvirtd which is still single threaded.
In libguestfs, you've no global "driver" state, so I'd suggest it is easiest to just call load_drv before starting the VM, and unload_drv after stopping it. The API contract requires that a plugin needs to cope with being loaded multiple times.
There are still issues with this. Multiple threads can call guestfs_launch (on different handles), resulting in reentrant calls to load_drv. I'm not convinced that the API as proposed is safe against this sort of thing:
http://lists.fedoraproject.org/pipermail/devel/2010-January/129117.html
As mentioned below, these APIs are intended for doing global state management, instead they are for version/feature compatibility checks.
(3) Having all the functions buried inside a structure means I cannot use ordinary techniques (ie. dlsym) to find out if a function is supported. Because I cannot use dlsym, I'm forced to use hard-coded ABI checks!
You're thinking only about libvirtd checking what a plugin supports. The load_drv() method is about the *opposite*, allowing the plugin to check what libvirtd supports. You can't do that with dlsym, or __attribute__().
You can by renaming the lock manager DSO. Still, better to avoid having to do that by having at least a minimum stable ABI which you support forever.
There can be multiple lock manager plugin impls, with different requirements. Renaming the DSO impacts all implementations, where as the load_drv() callback is intended to allow individual impls to define their own minimum requirements.
In addition to the reasons already mentioned for constructor not being able to replace load_drv, "destructor" can't replace the unload_drv method. The unload_drv method is there to allow the plugin to refuse to allow module unloading from the app.
Why not just pause in the resource release function (shutdown_drv is it?) I'm still unclear what sort of locking would allow you to release the resource but still be unable to unload the module.
It isn't about the locking preventing unload. There are a variety of scenarios in which is it unsafe to unload *any* dlopen'd library. eg, if the module has set a thread local variable with a destructor, it is not safe to unload the module unless all threads have terminated, or all thread locals cleaned up. The module has no way to ensure this, so the only safe option is to never dlclose() it, otherwise you'll get SIGBUS or SIGSEGV. The unload_drv() method return code is to allow for this scenario to be dealt with safely.
The thing here is that I have to translate the XML into something that the lock manager wants. Why not let me just pass the XML? Or the virDomainPtr?
At the point where libvirt drivers invoke the plugin, there is no virDomainPtr in available, only the internal objects virDomainDefPtr which aren't something we want to expose since they're not ABI stable. In addition from the context where the plugin executes, it likely won't be able to use the libvirt APIs safely without causing deadlock in libvirt drivers.
It isn't passing the XML because we didn't want the lock manager implementations to all have to re-implement XML parsing, and again the libvirt driver code doesn't have the XML around at this point.
This is unfriendly to other callers ...
Unfortunately this is a no-win situation. Either the plugin impls have to parse the XML, or the plugin callers have to. On balance there are more plugin impls than callers, and one of the callers already knows how to parse the XML, so I put the burden of XML parsing on the caller rather than the plugin impl. Regards, Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On Wed, Nov 10, 2010 at 04:42:01PM +0000, Daniel P. Berrange wrote:
On Wed, Nov 10, 2010 at 04:17:13PM +0000, Richard W.M. Jones wrote:
On Wed, Nov 10, 2010 at 03:52:49PM +0000, Daniel P. Berrange wrote:
The load_drv() method there is to allow the plugin implementation to block its use with a libvirtd that is too old. eg, say we add a new API 'foo', and the plugin implements that API. It wants to block its use in any libvirt which doesn't know about the 'foo'
What sort of thing could 'foo' be?
entry point, because such use could be dangerous to safety.
You should just rename the lock manager .so file in this case. But it'd be best to write an enduring minimum API and ABI now to avoid having to do this.
The load_drv() method allows that to be handled. If load_drv() fails libvirt/libguestfs *must* refuse to run any VMs.
Well, libguestfs has other alternatives. Ignoring the lock manager is one, allowing read-only calls to proceed is another one.
Allowing read-only calls to continue isn't guarenteed to be safe. The lock manager plugin may wish to deny even readonly usage, to ensure that there aren't data consistency complication with NFS close-to-open behaviour during migration. If a lock manager is configured for use and it fails, not running any VMs is the only safe thing, until the admin has fixed the config problem.
This is entirely separate from load_drv though. You can deny read-only access when I try to acquire the resource. What I'm wondering, and you're not answering, is what future event are you anticipating which would make the outline API (http://pastebin.ca/1987261) need to be completely overhauled? I can't imagine it ...
There are still issues with this. Multiple threads can call guestfs_launch (on different handles), resulting in reentrant calls to load_drv. I'm not convinced that the API as proposed is safe against this sort of thing:
http://lists.fedoraproject.org/pipermail/devel/2010-January/129117.html
As mentioned below, these APIs are intended for doing global are [not]? state management, instead they are for version/feature compatibility checks.
But back to my original point, if these fail I'm left SOL. This would prevent libguestfs from working under nearly every condition, and all I could suggest to users would be that something needs to be upgraded. A basic minimum always-supported API is a must here, unless there is genuinely some system out there that has such bizarre locking that we would need to change how this works completely, and if there is such a system perhaps we should consider it in the API now.
It isn't about the locking preventing unload. There are a variety of scenarios in which is it unsafe to unload *any* dlopen'd library. eg, if the module has set a thread local variable with a destructor, it is not safe to unload the module unless all threads have terminated, or all thread locals cleaned up. The module has no way to ensure this, so the only safe option is to never dlclose() it, otherwise you'll get SIGBUS or SIGSEGV.
This isn't true. An API which allowed futures should also allow those futures to be cancelled (from either the caller or callee side), and it should cancel them in the __attribute__((destructor)). In any case, the API proposed doesn't seem to have this.
This is unfriendly to other callers ...
Unfortunately this is a no-win situation. Either the plugin impls have to parse the XML, or the plugin callers have to. On balance there are more plugin impls than callers, and one of the callers already knows how to parse the XML, so I put the burden of XML parsing on the caller rather than the plugin impl.
That doesn't make it friendly. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into Xen guests. http://et.redhat.com/~rjones/virt-p2v

On Wed, Nov 10, 2010 at 05:29:28PM +0000, Richard W.M. Jones wrote:
On Wed, Nov 10, 2010 at 04:42:01PM +0000, Daniel P. Berrange wrote:
On Wed, Nov 10, 2010 at 04:17:13PM +0000, Richard W.M. Jones wrote:
On Wed, Nov 10, 2010 at 03:52:49PM +0000, Daniel P. Berrange wrote:
The load_drv() method there is to allow the plugin implementation to block its use with a libvirtd that is too old. eg, say we add a new API 'foo', and the plugin implements that API. It wants to block its use in any libvirt which doesn't know about the 'foo'
What sort of thing could 'foo' be?
entry point, because such use could be dangerous to safety.
You should just rename the lock manager .so file in this case. But it'd be best to write an enduring minimum API and ABI now to avoid having to do this.
The load_drv() method allows that to be handled. If load_drv() fails libvirt/libguestfs *must* refuse to run any VMs.
Well, libguestfs has other alternatives. Ignoring the lock manager is one, allowing read-only calls to proceed is another one.
Allowing read-only calls to continue isn't guarenteed to be safe. The lock manager plugin may wish to deny even readonly usage, to ensure that there aren't data consistency complication with NFS close-to-open behaviour during migration. If a lock manager is configured for use and it fails, not running any VMs is the only safe thing, until the admin has fixed the config problem.
This is entirely separate from load_drv though. You can deny read-only access when I try to acquire the resource.
What I'm wondering, and you're not answering, is what future event are you anticipating which would make the outline API (http://pastebin.ca/1987261) need to be completely overhauled? I can't imagine it ...
We can't predict the future, so I designed the plugin to give maximum flexibility to adapt to what it may bring. Making one API call after loading the plugin to check compatibility really isn't any burden and so its a worthwhile safety net for future change.
There are still issues with this. Multiple threads can call guestfs_launch (on different handles), resulting in reentrant calls to load_drv. I'm not convinced that the API as proposed is safe against this sort of thing:
http://lists.fedoraproject.org/pipermail/devel/2010-January/129117.html
As mentioned below, these APIs are intended for doing global are [not]? state management, instead they are for version/feature compatibility checks.
But back to my original point, if these fail I'm left SOL. This would prevent libguestfs from working under nearly every condition, and all I could suggest to users would be that something needs to be upgraded.
You wouldn't expect this to ever fail in normal circumstances. RPM dependancies should ensure it doesn't get into a broken state. This is there as a safety net in case the admin does something stupid, so that we don't pose any risk to the safety of the disk images.
A basic minimum always-supported API is a must here, unless there is genuinely some system out there that has such bizarre locking that we would need to change how this works completely, and if there is such a system perhaps we should consider it in the API now.
This isn't about completely changing the locking scheme, it is about allowing extra APIs to be added to provide additional information. If a plugin wants to be able to rely on that additional info being present, it should be able to check that this is supported by the libvirtd instance upfront. This avoids a runtime failure at time of guest startup, disk hotplug, or worse, at guest migration where it is possibly already too late to cope.
It isn't about the locking preventing unload. There are a variety of scenarios in which is it unsafe to unload *any* dlopen'd library. eg, if the module has set a thread local variable with a destructor, it is not safe to unload the module unless all threads have terminated, or all thread locals cleaned up. The module has no way to ensure this, so the only safe option is to never dlclose() it, otherwise you'll get SIGBUS or SIGSEGV.
This isn't true. An API which allowed futures should also allow those futures to be cancelled (from either the caller or callee side), and it should cancel them in the __attribute__((destructor)). In any case, the API proposed doesn't seem to have this.
Unfortunately there are plenty of existing APIs out there which don't provide a way to cancel future callbacks. eg pthread_key_create has no way to guarantee that the destructor won't ever be called again, unless you can kill off all threads before unloading the module. Another example is the GObject type system which does not allow you to unregister modules. Thus to be safe, we need to allow plugins to indicate that they don't wish to be unloaded from memory. Regards, Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

At libvirtd startup: driver = virLockManagerPluginLoad("sync-manager"); At libvirtd shtudown: virLockManagerPluginUnload(driver) At guest startup: manager = virLockManagerNew(driver, VIR_LOCK_MANAGER_START_DOMAIN, 0); virLockManagerSetParameter(manager, "id", id); virLockManagerSetParameter(manager, "uuid", uuid); virLockManagerSetParameter(manager, "name", name); foreach disk virLockManagerRegisterResource(manager, VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK, disk.path, ..flags...); char **supervisorargv; int supervisorargc; supervisor = virLockManagerGetSupervisorPath(manager); virLockManagerGetSupervisorArgs(&argv, &argc); cmd = qemuBuildCommandLine(supervisor, supervisorargv, supervisorargv); supervisorpid = virCommandExec(cmd); if (!virLockManagerGetChild(manager, &qemupid)) kill(supervisorpid); /* XXX or leave it running ??? */ At guest shutdown: ...send QEMU 'quit' monitor command, and/or kill(qemupid)... if (!virLockManagerShutdown(manager)) kill(supervisorpid); /* XXX or leave it running ??? */ virLockManagerFree(manager); At libvirtd restart with running guests: foreach still running guest manager = virLockManagerNew(driver, VIR_LOCK_MANAGER_START_DOMAIN, VIR_LOCK_MANAGER_NEW_ATTACH); virLockManagerSetParameter(manager, "id", id); virLockManagerSetParameter(manager, "uuid", uuid); virLockManagerSetParameter(manager, "name", name); if (!virLockManagerGetChild(manager, &qemupid)) kill(supervisorpid); /* XXX or leave it running ??? */ With disk hotplug: if (virLockManagerAcquireResource(manager, VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK, disk.path ..flags..)) ...abort hotplug attempt ... ...hotplug the device... With disk unhotplug: ...hotunplug the device... if (virLockManagerReleaseResource(manager, VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK, disk.path ..flags..)) ...log warning ... During migration: 1. On source host if (!virLockManagerPrepareMigrate(manager, hosturi)) ..don't start migration.. 2. On dest host manager = virLockManagerNew(driver, VIR_LOCK_MANAGER_START_DOMAIN, VIR_LOCK_MANAGER_NEW_MIGRATE); virLockManagerSetParameter(manager, "id", id); virLockManagerSetParameter(manager, "uuid", uuid); virLockManagerSetParameter(manager, "name", name); foreach disk virLockManagerRegisterResource(manager, VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK, disk.path, ..flags...); char **supervisorargv; int supervisorargc; supervisor = virLockManagerGetSupervisorPath(manager); virLockManagerGetSupervisorArgs(&argv, &argc); cmd = qemuBuildCommandLine(supervisor, supervisorargv, supervisorargv); supervisorpid = virCommandExec(cmd); if (!virLockManagerGetChild(manager, &qemupid)) kill(supervisorpid); /* XXX or leave it running ??? */ 3. Initiate migration in QEMU on source and wait for completion 4a. On failure 4a1 On target virLockManagerCompleteMigrateIn(manager, VIR_LOCK_MANAGER_MIGRATE_CANCEL); virLockManagerShutdown(manager); virLockManagerFree(manager); 4a2 On source virLockManagerCompleteMigrateIn(manager, VIR_LOCK_MANAGER_MIGRATE_CANCEL); 4b. On succcess 4b1 On target virLockManagerCompleteMigrateIn(manager, 0); 42 On source virLockManagerCompleteMigrateIn(manager, 0); virLockManagerShutdown(manager); virLockManagerFree(manager); Notes: - If a lock manager impl does just VM level leases, it can ignore all the resource paths at startup. - If a lock manager impl does not support migrate it can return an error from all migrate calls - If a lock manger impl does not support hotplug it can return an error from all resource acquire/release calls -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On 09/10/2010 10:01 AM, Daniel P. Berrange wrote:
At libvirtd startup:
driver = virLockManagerPluginLoad("sync-manager");
At libvirtd shtudown:
virLockManagerPluginUnload(driver)
Can you load more than one lock manager at a time, or just one active lock manager? How does a user configure which lock manager(s) to load when libvirtd is started?
At guest startup:
manager = virLockManagerNew(driver, VIR_LOCK_MANAGER_START_DOMAIN, 0); virLockManagerSetParameter(manager, "id", id); virLockManagerSetParameter(manager, "uuid", uuid); virLockManagerSetParameter(manager, "name", name);
foreach disk virLockManagerRegisterResource(manager, VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK, disk.path, ..flags...);
char **supervisorargv; int supervisorargc;
supervisor = virLockManagerGetSupervisorPath(manager); virLockManagerGetSupervisorArgs(&argv,&argc);
cmd = qemuBuildCommandLine(supervisor, supervisorargv, supervisorargv);
supervisorpid = virCommandExec(cmd);
if (!virLockManagerGetChild(manager,&qemupid)) kill(supervisorpid); /* XXX or leave it running ??? */
Would it be better to first try virLockManagerShutdown? And rather than a direct kill(), shouldn't this be virLockManagerFree?
At guest shutdown:
...send QEMU 'quit' monitor command, and/or kill(qemupid)...
if (!virLockManagerShutdown(manager)) kill(supervisorpid); /* XXX or leave it running ??? */
virLockManagerFree(manager);
At libvirtd restart with running guests:
foreach still running guest manager = virLockManagerNew(driver, VIR_LOCK_MANAGER_START_DOMAIN, VIR_LOCK_MANAGER_NEW_ATTACH); virLockManagerSetParameter(manager, "id", id); virLockManagerSetParameter(manager, "uuid", uuid); virLockManagerSetParameter(manager, "name", name);
if (!virLockManagerGetChild(manager,&qemupid)) kill(supervisorpid); /* XXX or leave it running ??? */
With disk hotplug:
if (virLockManagerAcquireResource(manager, VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK, disk.path ..flags..)) ...abort hotplug attempt ...
...hotplug the device...
With disk unhotplug:
...hotunplug the device...
if (virLockManagerReleaseResource(manager, VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK, disk.path ..flags..)) ...log warning ...
During migration:
1. On source host
if (!virLockManagerPrepareMigrate(manager, hosturi)) ..don't start migration..
2. On dest host
manager = virLockManagerNew(driver, VIR_LOCK_MANAGER_START_DOMAIN, VIR_LOCK_MANAGER_NEW_MIGRATE); virLockManagerSetParameter(manager, "id", id); virLockManagerSetParameter(manager, "uuid", uuid); virLockManagerSetParameter(manager, "name", name);
foreach disk virLockManagerRegisterResource(manager, VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK, disk.path, ..flags...);
So if there needs to be any relaxation of locks from exclusive to shared-write for the duration of the migration, that would be the responsibility of virLockManagerPrepareMigrate, and not done directly by libvirt?
char **supervisorargv; int supervisorargc;
supervisor = virLockManagerGetSupervisorPath(manager); virLockManagerGetSupervisorArgs(&argv,&argc);
cmd = qemuBuildCommandLine(supervisor, supervisorargv, supervisorargv);
supervisorpid = virCommandExec(cmd);
if (!virLockManagerGetChild(manager,&qemupid)) kill(supervisorpid); /* XXX or leave it running ??? */
3. Initiate migration in QEMU on source and wait for completion
4a. On failure
4a1 On target
virLockManagerCompleteMigrateIn(manager, VIR_LOCK_MANAGER_MIGRATE_CANCEL); virLockManagerShutdown(manager); virLockManagerFree(manager);
4a2 On source
virLockManagerCompleteMigrateIn(manager, VIR_LOCK_MANAGER_MIGRATE_CANCEL);
Wouldn't this be virLockManagerCompleteMigrateOut?
4b. On succcess
4b1 On target
virLockManagerCompleteMigrateIn(manager, 0);
42 On source
virLockManagerCompleteMigrateIn(manager, 0);
Likewise?
virLockManagerShutdown(manager); virLockManagerFree(manager);
Notes:
- If a lock manager impl does just VM level leases, it can ignore all the resource paths at startup.
- If a lock manager impl does not support migrate it can return an error from all migrate calls
- If a lock manger impl does not support hotplug it can return an error from all resource acquire/release calls
Overall, this looks workable to me. As proposed, this assumes a 1:1 relation between LockManager process and managed VMs. But I guess you can still have a central manager process that manages all the VMs, by having the lock manager plugin spawn a simple shim process that does all the communication with the central lock manager. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On Fri, Sep 10, 2010 at 02:39:41PM -0600, Eric Blake wrote:
On 09/10/2010 10:01 AM, Daniel P. Berrange wrote:
At libvirtd startup:
driver = virLockManagerPluginLoad("sync-manager");
At libvirtd shtudown:
virLockManagerPluginUnload(driver)
Can you load more than one lock manager at a time, or just one active lock manager? How does a user configure which lock manager(s) to load when libvirtd is started?
The intention is that any specific libvirt driver will only use one lock manager, but LXC vs QEMU vs UML could each use a different driver if required.
At guest startup:
manager = virLockManagerNew(driver, VIR_LOCK_MANAGER_START_DOMAIN, 0); virLockManagerSetParameter(manager, "id", id); virLockManagerSetParameter(manager, "uuid", uuid); virLockManagerSetParameter(manager, "name", name);
foreach disk virLockManagerRegisterResource(manager, VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK, disk.path, ..flags...);
char **supervisorargv; int supervisorargc;
supervisor = virLockManagerGetSupervisorPath(manager); virLockManagerGetSupervisorArgs(&argv,&argc);
cmd = qemuBuildCommandLine(supervisor, supervisorargv, supervisorargv);
supervisorpid = virCommandExec(cmd);
if (!virLockManagerGetChild(manager,&qemupid)) kill(supervisorpid); /* XXX or leave it running ??? */
Would it be better to first try virLockManagerShutdown? And rather than a direct kill(), shouldn't this be virLockManagerFree?
Yes I guess so
During migration:
1. On source host
if (!virLockManagerPrepareMigrate(manager, hosturi)) ..don't start migration..
2. On dest host
manager = virLockManagerNew(driver, VIR_LOCK_MANAGER_START_DOMAIN, VIR_LOCK_MANAGER_NEW_MIGRATE); virLockManagerSetParameter(manager, "id", id); virLockManagerSetParameter(manager, "uuid", uuid); virLockManagerSetParameter(manager, "name", name);
foreach disk virLockManagerRegisterResource(manager, VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK, disk.path, ..flags...);
So if there needs to be any relaxation of locks from exclusive to shared-write for the duration of the migration, that would be the responsibility of virLockManagerPrepareMigrate, and not done directly by libvirt?
As with my other reply on this topic, I didn't want to force a particular design / implementation strategy for migration, so I just put in actions at each key stage of migration. The driver impl can decide whether todo a plain release+reacquire, or use some kind of shared lock
char **supervisorargv; int supervisorargc;
supervisor = virLockManagerGetSupervisorPath(manager); virLockManagerGetSupervisorArgs(&argv,&argc);
cmd = qemuBuildCommandLine(supervisor, supervisorargv, supervisorargv);
supervisorpid = virCommandExec(cmd);
if (!virLockManagerGetChild(manager,&qemupid)) kill(supervisorpid); /* XXX or leave it running ??? */
3. Initiate migration in QEMU on source and wait for completion
4a. On failure
4a1 On target
virLockManagerCompleteMigrateIn(manager, VIR_LOCK_MANAGER_MIGRATE_CANCEL); virLockManagerShutdown(manager); virLockManagerFree(manager);
4a2 On source
virLockManagerCompleteMigrateIn(manager, VIR_LOCK_MANAGER_MIGRATE_CANCEL);
Wouldn't this be virLockManagerCompleteMigrateOut?
Opps, yes.
4b. On succcess
4b1 On target
virLockManagerCompleteMigrateIn(manager, 0);
42 On source
virLockManagerCompleteMigrateIn(manager, 0);
Likewise?
Yes
virLockManagerShutdown(manager); virLockManagerFree(manager);
Notes:
- If a lock manager impl does just VM level leases, it can ignore all the resource paths at startup.
- If a lock manager impl does not support migrate it can return an error from all migrate calls
- If a lock manger impl does not support hotplug it can return an error from all resource acquire/release calls
Overall, this looks workable to me. As proposed, this assumes a 1:1 relation between LockManager process and managed VMs. But I guess you can still have a central manager process that manages all the VMs, by having the lock manager plugin spawn a simple shim process that does all the communication with the central lock manager.
I could have decided it such that it didn't assume presence of a angel process around each VM, but I think it is easier to be able to presume that there is one. It can be an incredibly thin stub if desired, so I don't think it'll be too onerous on implementations Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On Mon, 2010-09-13 at 13:35 +0100, Daniel P. Berrange wrote:
On Fri, Sep 10, 2010 at 02:39:41PM -0600, Eric Blake wrote:
On 09/10/2010 10:01 AM, Daniel P. Berrange wrote:
At libvirtd startup:
driver = virLockManagerPluginLoad("sync-manager");
At libvirtd shtudown:
virLockManagerPluginUnload(driver)
Can you load more than one lock manager at a time, or just one active lock manager? How does a user configure which lock manager(s) to load when libvirtd is started?
The intention is that any specific libvirt driver will only use one lock manager, but LXC vs QEMU vs UML could each use a different driver if required.
At guest startup:
manager = virLockManagerNew(driver, VIR_LOCK_MANAGER_START_DOMAIN, 0); virLockManagerSetParameter(manager, "id", id); virLockManagerSetParameter(manager, "uuid", uuid); virLockManagerSetParameter(manager, "name", name);
foreach disk virLockManagerRegisterResource(manager, VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK, disk.path, ..flags...);
char **supervisorargv; int supervisorargc;
supervisor = virLockManagerGetSupervisorPath(manager); virLockManagerGetSupervisorArgs(&argv,&argc);
cmd = qemuBuildCommandLine(supervisor, supervisorargv, supervisorargv);
supervisorpid = virCommandExec(cmd);
if (!virLockManagerGetChild(manager,&qemupid)) kill(supervisorpid); /* XXX or leave it running ??? */
Would it be better to first try virLockManagerShutdown? And rather than a direct kill(), shouldn't this be virLockManagerFree?
Yes I guess so
During migration:
1. On source host
if (!virLockManagerPrepareMigrate(manager, hosturi)) ..don't start migration..
2. On dest host
manager = virLockManagerNew(driver, VIR_LOCK_MANAGER_START_DOMAIN, VIR_LOCK_MANAGER_NEW_MIGRATE); virLockManagerSetParameter(manager, "id", id); virLockManagerSetParameter(manager, "uuid", uuid); virLockManagerSetParameter(manager, "name", name);
foreach disk virLockManagerRegisterResource(manager, VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK, disk.path, ..flags...);
So if there needs to be any relaxation of locks from exclusive to shared-write for the duration of the migration, that would be the responsibility of virLockManagerPrepareMigrate, and not done directly by libvirt?
As with my other reply on this topic, I didn't want to force a particular design / implementation strategy for migration, so I just put in actions at each key stage of migration. The driver impl can decide whether todo a plain release+reacquire, or use some kind of shared lock
char **supervisorargv; int supervisorargc;
supervisor = virLockManagerGetSupervisorPath(manager); virLockManagerGetSupervisorArgs(&argv,&argc);
cmd = qemuBuildCommandLine(supervisor, supervisorargv, supervisorargv);
supervisorpid = virCommandExec(cmd);
if (!virLockManagerGetChild(manager,&qemupid)) kill(supervisorpid); /* XXX or leave it running ??? */
3. Initiate migration in QEMU on source and wait for completion
4a. On failure
4a1 On target
virLockManagerCompleteMigrateIn(manager, VIR_LOCK_MANAGER_MIGRATE_CANCEL); virLockManagerShutdown(manager); virLockManagerFree(manager);
4a2 On source
virLockManagerCompleteMigrateIn(manager, VIR_LOCK_MANAGER_MIGRATE_CANCEL);
Wouldn't this be virLockManagerCompleteMigrateOut?
Opps, yes.
4b. On succcess
4b1 On target
virLockManagerCompleteMigrateIn(manager, 0);
42 On source
virLockManagerCompleteMigrateIn(manager, 0);
Likewise?
Yes
virLockManagerShutdown(manager); virLockManagerFree(manager);
Notes:
- If a lock manager impl does just VM level leases, it can ignore all the resource paths at startup.
- If a lock manager impl does not support migrate it can return an error from all migrate calls
- If a lock manger impl does not support hotplug it can return an error from all resource acquire/release calls
Overall, this looks workable to me. As proposed, this assumes a 1:1 relation between LockManager process and managed VMs. But I guess you can still have a central manager process that manages all the VMs, by having the lock manager plugin spawn a simple shim process that does all the communication with the central lock manager.
I could have decided it such that it didn't assume presence of a angel process around each VM, but I think it is easier to be able to presume that there is one. It can be an incredibly thin stub if desired, so I don't think it'll be too onerous on implementations
We are looking into the possibility of not having a process manage a VM but rather having the sync_manager process register with a central daemon and exec into qemu (or anything else) so assuming there is a process per VM is essentially false. But the verb could be used for "unregistering" the current instance with the main manager so the verb does have it use.
Further more, even if we decide to leave the current 'sync_manager process per child process' system as is for now. The general direction is a central deamon per host for managing all the leases and guarding all processes. So be sure to keep that in mind while assembling the API.
Daniel

On Mon, Sep 13, 2010 at 03:20:13PM +0200, Saggi Mizrahi wrote:
On Mon, 2010-09-13 at 13:35 +0100, Daniel P. Berrange wrote:
Overall, this looks workable to me. As proposed, this assumes a 1:1 relation between LockManager process and managed VMs. But I guess you can still have a central manager process that manages all the VMs, by having the lock manager plugin spawn a simple shim process that does all the communication with the central lock manager.
I could have decided it such that it didn't assume presence of a angel process around each VM, but I think it is easier to be able to presume that there is one. It can be an incredibly thin stub if desired, so I don't think it'll be too onerous on implementations
We are looking into the possibility of not having a process manage a VM but rather having the sync_manager process register with a central daemon and exec into qemu (or anything else) so assuming there is a process per VM is essentially false. But the verb could be used for "unregistering" the current instance with the main manager so the verb does have it use.
Further more, even if we decide to leave the current 'sync_manager process per child process' system as is for now. The general direction is a central deamon per host for managing all the leases and guarding all processes. So be sure to keep that in mind while assembling the API.
Having a single daemon per host that exec's the VMs is explicitly *not* something we intend to support because the QEMU process needs to inherit its process execution state from libvirtd. It is fundamental to the security architecture that processes are completely isolated the moment that libvirtd has spawned them. We don't want to offload all the security driver setup into a central lock manager daemon. Aside from this we also pass open file descriptors down from libvirtd to the QEMU daemon. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On Mon, Sep 13, 2010 at 03:20:13PM +0200, Saggi Mizrahi wrote:
On Mon, 2010-09-13 at 13:35 +0100, Daniel P. Berrange wrote:
Overall, this looks workable to me. As proposed, this assumes a 1:1 relation between LockManager process and managed VMs. But I guess you can still have a central manager process that manages all the VMs, by having the lock manager plugin spawn a simple shim process that does all the communication with the central lock manager.
I could have decided it such that it didn't assume presence of a angel process around each VM, but I think it is easier to be able to presume that there is one. It can be an incredibly thin stub if desired, so I don't think it'll be too onerous on implementations
We are looking into the possibility of not having a process manage a VM but rather having the sync_manager process register with a central daemon and exec into qemu (or anything else) so assuming there is a process per VM is essentially false. But the verb could be used for "unregistering" the current instance with the main manager so the verb does have it use.
Further more, even if we decide to leave the current 'sync_manager process per child process' system as is for now. The general direction is a central deamon per host for managing all the leases and guarding all processes. So be sure to keep that in mind while assembling the API.
Having a single daemon per host that exec's the VMs is explicitly *not* something we intend to support because the QEMU process needs to inherit its process execution state from libvirtd. It is fundamental to the security architecture that processes are completely isolated the moment that libvirtd has spawned them. We don't want to offload all the security driver setup into a central lock manager daemon. Aside from this we also pass open file descriptors down from libvirtd to the QEMU daemon. My explanation might have been confusing or ill phrased. I'll try again. What the suggestion was: instead of libvirt running sync manager that will fork off and run qemu.
On Mon, 2010-09-13 at 14:29 +0100, Daniel P. Berrange wrote: libvirt would run sync_manager wrapper that will register with the central daemon wait for it to acquire leases and then exec to qemu (in process). From that moment the central daemon monitors the process and when it quits frees it's leases. This way we still keep all the context stuff from libvirt and have only 1 process managing the leases. But, as I said, this is only a suggestion and is still in very early stages. We might not implement in the initial version and leave the current forking method.
Daniel

On Mon, Sep 13, 2010 at 03:49:38PM +0200, Saggi Mizrahi wrote:
On Mon, Sep 13, 2010 at 03:20:13PM +0200, Saggi Mizrahi wrote:
On Mon, 2010-09-13 at 13:35 +0100, Daniel P. Berrange wrote:
Overall, this looks workable to me. As proposed, this assumes a 1:1 relation between LockManager process and managed VMs. But I guess you can still have a central manager process that manages all the VMs, by having the lock manager plugin spawn a simple shim process that does all the communication with the central lock manager.
I could have decided it such that it didn't assume presence of a angel process around each VM, but I think it is easier to be able to presume that there is one. It can be an incredibly thin stub if desired, so I don't think it'll be too onerous on implementations
We are looking into the possibility of not having a process manage a VM but rather having the sync_manager process register with a central daemon and exec into qemu (or anything else) so assuming there is a process per VM is essentially false. But the verb could be used for "unregistering" the current instance with the main manager so the verb does have it use.
Further more, even if we decide to leave the current 'sync_manager process per child process' system as is for now. The general direction is a central deamon per host for managing all the leases and guarding all processes. So be sure to keep that in mind while assembling the API.
Having a single daemon per host that exec's the VMs is explicitly *not* something we intend to support because the QEMU process needs to inherit its process execution state from libvirtd. It is fundamental to the security architecture that processes are completely isolated the moment that libvirtd has spawned them. We don't want to offload all the security driver setup into a central lock manager daemon. Aside from this we also pass open file descriptors down from libvirtd to the QEMU daemon. My explanation might have been confusing or ill phrased. I'll try again. What the suggestion was: instead of libvirt running sync manager that will fork off and run qemu.
On Mon, 2010-09-13 at 14:29 +0100, Daniel P. Berrange wrote: libvirt would run sync_manager wrapper that will register with the central daemon wait for it to acquire leases and then exec to qemu (in process). From that moment the central daemon monitors the process and when it quits frees it's leases. This way we still keep all the context stuff from libvirt and have only 1 process managing the leases. But, as I said, this is only a suggestion and is still in very early stages. We might not implement in the initial version and leave the current forking method.
That is probably possible with the current security driver implementations but more generally I think it will still hit some trouble. Specifically one of the items on our todo list is a new security driver that makes use of Linux container namespace functionality to isolate the VMs, so they can't even see other resources / processes on the host. This may well prevent the sync manager wrapper talking to a central sync mnager process The general rule we aim for is that once libvirtd has spawned a VM they are completely isolated with exception of any disks marked with <shareable/> In other words, any communictions channels must be initiated/established by the mgmt layer to the VM process, with nothing to be established in the reverse direction. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

----- "Daniel P. Berrange" <berrange@redhat.com> wrote:
On Mon, 2010-09-13 at 14:29 +0100, Daniel P. Berrange wrote:
On Mon, Sep 13, 2010 at 03:20:13PM +0200, Saggi Mizrahi wrote:
On Mon, 2010-09-13 at 13:35 +0100, Daniel P. Berrange wrote:
Overall, this looks workable to me. As proposed, this
assumes a 1:1
relation between LockManager process and managed VMs. But I guess you can still have a central manager process that manages all
having the lock manager plugin spawn a simple shim process
the communication with the central lock manager.
I could have decided it such that it didn't assume presence of a angel process around each VM, but I think it is easier to be able to
that there is one. It can be an incredibly thin stub if desired, so I don't think it'll be too onerous on implementations
We are looking into the possibility of not having a process manage a VM but rather having the sync_manager process register with a central daemon and exec into qemu (or anything else) so assuming there is a process per VM is essentially false. But the verb could be used for "unregistering" the current instance with the main manager so
does have it use.
Further more, even if we decide to leave the current 'sync_manager process per child process' system as is for now. The general
is a central deamon per host for managing all the leases and guarding all processes. So be sure to keep that in mind while assembling
Having a single daemon per host that exec's the VMs is explicitly
*not*
something we intend to support because the QEMU process needs to inherit its process execution state from libvirtd. It is fundamental to
On Mon, Sep 13, 2010 at 03:49:38PM +0200, Saggi Mizrahi wrote: the VMs, by that does all presume the verb direction the API. the
security architecture that processes are completely isolated the moment that libvirtd has spawned them. We don't want to offload all the security driver setup into a central lock manager daemon. Aside from this we also pass open file descriptors down from libvirtd to the QEMU daemon. My explanation might have been confusing or ill phrased. I'll try again. What the suggestion was: instead of libvirt running sync manager that will fork off and run qemu. libvirt would run sync_manager wrapper that will register with the central daemon wait for it to acquire leases and then exec to qemu (in process). From that moment the central daemon monitors the process and when it quits frees it's leases. This way we still keep all the context stuff from libvirt and have only 1 process managing the leases. But, as I said, this is only a suggestion and is still in very early stages. We might not implement in the initial version and leave the current forking method.
That is probably possible with the current security driver implementations but more generally I think it will still hit some trouble. Specifically one of the items on our todo list is a new security driver that makes use of Linux container namespace functionality to isolate the VMs, so they
can't even see other resources / processes on the host. This may well prevent the sync manager wrapper talking to a central sync mnager process The general rule we aim for is that once libvirtd has spawned a VM they are completely isolated with exception of any disks marked with <shareable/> In other words, any communictions channels must be initiated/established by the mgmt layer to the VM process, with nothing to be established in the reverse direction. Correct me if I'm wrong, but the security limitations (selinux context) would only take effect after the "exec", no? so the process could still communicate with the daemon, open an FD and then exec. After exec, the VM would be locked down but the daemon could still wait on the FD to see whether VM has died.
Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On Tue, Sep 14, 2010 at 05:03:21PM -0400, Ayal Baron wrote:
----- "Daniel P. Berrange" <berrange@redhat.com> wrote:
That is probably possible with the current security driver implementations but more generally I think it will still hit some trouble. Specifically one of the items on our todo list is a new security driver that makes use of Linux container namespace functionality to isolate the VMs, so they
can't even see other resources / processes on the host. This may well prevent the sync manager wrapper talking to a central sync mnager process The general rule we aim for is that once libvirtd has spawned a VM they are completely isolated with exception of any disks marked with <shareable/> In other words, any communictions channels must be initiated/established by the mgmt layer to the VM process, with nothing to be established in the reverse direction.
Correct me if I'm wrong, but the security limitations (selinux context) would only take effect after the "exec", no? so the process could still communicate with the daemon, open an FD and then exec. After exec, the VM would be locked down but the daemon could still wait on the FD to see whether VM has died.
It depends on which exec you are talking about here. If the comms to the daemon are done straight from the libvirtd plugin, then it would still be unrestricted. If the comms were done from the supervisor process, it would be restricted. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On Mon, Sep 13, 2010 at 02:29:49PM +0100, Daniel P. Berrange wrote:
We are looking into the possibility of not having a process manage a VM but rather having the sync_manager process register with a central daemon and exec into qemu (or anything else) so assuming there is a process per VM is essentially false. But the verb could be used for "unregistering" the current instance with the main manager so the verb does have it use.
Further more, even if we decide to leave the current 'sync_manager process per child process' system as is for now. The general direction is a central deamon per host for managing all the leases and guarding all processes. So be sure to keep that in mind while assembling the API.
Having a single daemon per host that exec's the VMs is explicitly *not* something we intend to support because the QEMU process needs to inherit its process execution state from libvirtd. It is fundamental to the security architecture that processes are completely isolated the moment that libvirtd has spawned them. We don't want to offload all the security driver setup into a central lock manager daemon. Aside from this we also pass open file descriptors down from libvirtd to the QEMU daemon.
I think there are two valid approaches people may want to take, both of which seem compatible with the requirements you describe. First, per the RFC (and current sync_manager design): Method 1: libvirt forks a supervisor process which: 1. acquires lock 2. forks qemu 3. waits for qemu exit 4. releases lock Method 2: without a supervisor process, libvirt: 1. calls plugin acquire_lock() 2. forks qemu 3. waits for qemu exit 4. calls plugin release_lock() (I should add 2.b. calls plugin set_pid() to tell lock manager the qemu pid.) In method 2, I expect the plugin functions would generally send ipc to a local lock manager daemon that holds and manages locks. (And may also wish to monitor the qemu pid itself.) Dave

Sorry for commenting so late but it was Rosh Hashana over here. Just to give you some pointers, sync_manager should also be zero config. host id, timeouts etc are not in a config. Host Id is decided when sync_manager starts, and timeouts are decided upon lease creation so that all the sync_managers in the cluster are synchronized. Also when thinking about migration we have to think about the situation where the source host in the migration can't see the storage thus can't update the lease. The solution we currently have in mind is using the leader version to make sure no one took the lease while the handover was happening. This will mean giving sync_manager an optional parameter (forceLVer?). This will all be managed by vdsm\rhev-m. You just have to think that there might be something other then leases location. Maybe just add a generic addArg param. Or if you think all locking mechanisms should have this feature, make it a part of the API. On Fri, 2010-09-10 at 16:49 +0100, Daniel P. Berrange wrote:
A few weeks back David introduce sync-manager as a means for protecting against the same VM being started on multiple hosts:
https://www.redhat.com/archives/libvir-list/2010-August/msg00179.html
This is obviously a desirable feature for libvirt, but we don't want to have a hardcoded dependancy on a particular implementation. We can achieve isolation between libvirt & sync-manager, allowing for alternative implementations by specifying a formal plugin for supervisor processes.
What follows is my mockup of what a plugin API would look like, its internal libvirt API and an outline of the usage scenarios in psuedo code.
At the minimum level of compliance, the plugin implementation provides for per-VM level exclusion. Optionally a plugin can declare that it
- supports locks against individual resources (disks) - supports hotplug of individual resources - supports migration of the supervised process even
Migration/hotplug will be forbid if those capabilities aren't declared by the plugin.
In parallel with David's work on sync-manager, I intend to write a simple plugin implementation based solely on fcntl() locking. It is important that we have 2 independant implementations to prove the validity of the plugin API. The fcntl() impl should provide a zero-config impl we can deploy out of the box that will provide protection against 2 vms using the same disk image on a host, and limited protection in multi-host scenarios with shared filesystems (no protection for shared block devs). Perhaps we should have a 3rd impl based on libdlm too, for Red Hat ClusterSuite scenarios, though sync-manager may well be general & simple enough to easily cope with this already.
Daniel
participants (7)
-
Ayal Baron
-
Daniel P. Berrange
-
Daniel Veillard
-
David Teigland
-
Eric Blake
-
Richard W.M. Jones
-
Saggi Mizrahi