On Wed, Oct 15, 2008 at 10:23:03AM -0700, Itamar Heim wrote:
Hi,
I am interested to find out how libvirt envisions image locking.
i.e., how do we make sure multiple nodes are not trying to access the
same storage volume, probably causing image corruption.
I know this can be solved by means of a cluster, but it seems excessive
(and not possible in all scenarios).
Central administration (ovirt-like) is also problematic, since unless
fencing is assumed, it cannot verify the image is no longer being used.
If the image/storage volume could be locked (or leased lock), either by
the storage/LVM/NFS preventing from a node access to a specific image,
or by having libvirt/qemu mutually check the lock before accessing it
(very simply, in leased lock we can check every X minutes, and kill the
process to make sure it honors the lock).
In the domain XML format, the semantics are that every <disk> section
added to a guest config is read-write, with an exclusive lock. To allow
multiple guests to use the same disk, is intended that you add either
<readonly/> or <sharable/> element within the <disk>.
That all said, we only implement this for the Xen driver, handing off
the actuall logic to XenD to perform. That we don't implement this in
the QEMU driver is a clear shortcoming that needs addressing.
If we only care about locking within the scope of a single host, it is
trivial - libvirtd knows all VMs and their config, so can trivially
ensure the appropriate exlusivity checks are done at time of VM start.
As you point out, ideally this locking would be enforced across hosts
too, in the case of shared storage. Cluster software can't actually
magically solve this for us - it can really only make sure the same
VM is not started twice. I'm not sure that libivirt can neccessarily
solve it in the general case either, but we can at least make an
effort in some cases. If, for instance, we were to take a proper
fcntl() lock over the files, this would work for disks backed by a
file on shared filesystems like NFS / GFS. fcntl() locks won't
work on disks backed by iSCSI/SCSI block devices though - and this
will actuall play nicely with clustersoftrware too, since they can
be made to forcably release NFS locks when fencing a node. It is
possible that SCSI reservations can help in the FibreChannel case.
So as an immediate option we should perform the exclusivity checks in
libvirt, and also apply fcntl() locks over file backed disks.
Daniel
--
|: Red Hat, Engineering, London -o-
http://people.redhat.com/berrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org -o-
http://ovirt.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|