On Wed, Aug 11, 2010 at 03:37:12PM -0400, David Teigland wrote:
On Wed, Aug 11, 2010 at 05:59:55PM +0100, Daniel P. Berrange wrote:
> On Tue, Aug 10, 2010 at 12:44:06PM -0400, David Teigland wrote:
> > Hi,
> >
> > We've been working on a program called sync_manager that implements
> > shared-storage-based leases to protect shared resources. One way we'd
like
> > to use it is to protect vm images that reside on shared storage,
> > i.e. preventing two vm's on two hosts from using the same image at once.
>
> There's two different, but related problems here:
>
> - Preventing 2 different VMs using the same disk
> - Preventing the same VM running on 2 hosts at once
>
> The first requires that there is a lease per configured disk (since
> a guest can have multiple disks). The latter requires a lease per
> VM and can ignore specifices of what disks are configured.
>
> IIUC, sync-manager is aiming for the latter.
The present integration effort is aiming for the latter. sync_manager
itself aims to be agnostic about what it's managing.
Ok, it makes a bit of a difference to how we integrate with
it in libvirt. If we want to ever let sync-manager do per-disk
leases then we'd want to pass more info to the SM callbacks
so it knows what disks QEMU has, instead of just its name
> > It's functional, and the next big step is using it
through libvirt.
> >
> > sync_manager "wraps" a process, i.e. acquires a lease,
forks&execs a
> > process, renews the lease wile the process runs, and releases the lease
> > when the process exits. While the process runs, it has exclusive access
> > to whatever resource was named in the lease that was acquired.
>
> There are complications around migration we need to consider too.
> During migration, you actually need QEMU running on two hosts at
> once. IIRC the idea is that before starting the migration operation,
> we'd have to tell sync-manager to mark the lease as shared with a
> specific host. The destination QEMU would have to startup in shared
> mode, and upgrade this to an exclusive lock when migration completes,
> or quit when migration fails.
sync_manager leases can only be exclusive, so it's a matter of transfering
ownership of the exclusive lock from source host to destination host. We
have not yet added lease transfer capabilities to sync_manager, but it
might look something like this:
In the past I've discussed with the original SM authors the idea of
introducing a shared lease concept. The idea was
- QEMU is running on S with an exclusive lease
- User requests migration to D
- SM on S downgrades the exclusive lease to a shared
lease, shared only with host D
- libvirt starts QEMU on D host to accept migration
- SM on D grabs the exclusive lease
- libvirt starts migration on S
- If migration succeeds
- libvirt kills QEMU on S
- SM on D upgrades its shared lease to exclusive
- If migration fails
- libvirt kills QEMU on D
- SM on S upgrades its shared lease to exclusive
S = source host, sm-S = sync_manager on S, ...
D = destination host, sm-D = sync_manager on D, ...
1. sm-S holds the lease, and is monitoring qemu
2. migration begins from S to D
3. libvirt-D runs sm-D: sync_manager -c qemu with the addition of a new
sync_manager option --receive-lease
4. sm-D writes its hostid D to the lease area signaling sm-S that it wants
to be the lease owner when S is done with it
5. sm-D begins monitoring the lease owner on disk (which is still S)
6. sm-D forks qemu-D
7. sm-S sees that D wants the lease
8. qemu-S exits with success
9. sm-S sees qemu-S exit with success
10. sm-S writes D as the lease owner into the lease area and exits
(in the non-migration/transfer case, sm-S writes owner=LEASE_FREE)
11. sm-D (still monitoring the lease owner) sees that it has become the
owner, and begins renewing the lease
12. qemu-D runs fully
I don't know enough (anything) about qemu migration yet to say if those
steps work correctly or safely. One concern is that qemu-D should not
enter a state where it can write until we are certain that D has been
written as the lease's owner.
Unfortunately the way migration works with QEMU prevents this
scenario. This led us to invent the idea of a shared lease
that is only used during migration.
> The one further complication is with the security drivers. IIUC,
we will
> absolutely not want QEMU to have any access to the shared storage lease
> area. The problem is that if we just inject the wrapper process as is,
> sync-manager will end up running with exact same privileges as QEMU.
> ie same UID:GID, and same selinux context. I'm really not at all sure
> how to deal with this problem, because our core design is that the thing
> we spawn inherits the privileges we setup at fork() time. We don't want
> to delegate the security setup to sync-manager, because it introduces
> a huge variable condition in the security system. We need guarenteed
> consistent security setup for QEMU, regardless of supervisor process
> in use.
It might not be a big problem for qemu to write to its own lease area,
but writing to another's probably would (e.g. at a different offset on the
same lv). That implies a separate lease lv per qemu; I'll have to find
out how close that gets to lvm scalability limits.
Since SM is such an important process / job, I think it is really worth
trying to get strict separation between SM and QEMU. Our goal with QEMU
security is that QEMU can never access any host resource that isn't
explicitly assigned via the XML config. This implies that it shouldn't
be allowed to access any SM data, even if this would theoretically not
cause problems for SM mutial exclusion
Regards,
Daniel
--
|: Red Hat, Engineering, London -o-
http://people.redhat.com/berrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org -o-
http://deltacloud.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|