[Libvir] Concepts in storage management

17 Oct 2007

      So we've had many email threads on the subject of storage, but none have 
resulted in a satisfactory way forward to implementing any storage mgmt
apis. Part of the problem I think is that we've not tried to understand
all the various concepts / technologies which are available & how they
relate to each other.  This mail attempts to outline all the different
technologies. There's a short list of API operations, but I don't want
to particularly get into API details until we have a good undersatnding
of the concepts.

First and foremost I don't believe it is acceptable to say we're only
going to allow one kind of storage. Storage is the key piece of infrastructure
for any serious network, and we have to be able to adapt to deployment 
scenarios that present themselves. 

Second, there is clearly a huge number of storage technologies here and 
there's no way we'll implement support for all of them in one go. So we 
need to prioritize getting the conceptual model correct, to allow us to
incrementally support new types of storage backend.

Taxonomy of storage types
=========================

 |
 +- Block
 |   |
 |   +- Disk
 |   |   |
 |   |   +- Direct attached
 |   |   |   |
 |   |   |   +- IDE/ATA disk
 |   |   |   +- SCSI disk
 |   |   |   +- FibreChannel disk
 |   |   |   +- USB disk/flash
 |   |   |   +- FireWire disk/flash
 |   |   |
 |   |   +- Remote attached
 |   |       |
 |   |       +- iSCSI disk
 |   |       +- GNBD disk
 |   |
 |   +- Partition
 |   |
 |   +- Virtual
 |       |
 |       +- Direct attached
 |       |   |
 |       |   +- LVM
 |       |   +- ZFS
 |       |
 |       +- Remote attached
 |           |
 |           +- Cluster LVM
 |
 +- FileSystem
 |   |
 |   +- Directed attached
 |   |   |
 |   |   +- ext2/3/4
 |   |   +- xfs
 |   |   +- ZFS
 |   |
 |   +- Remote attached
 |       |
 |       +- NFS
 |       +- GFS
 |       +- OCFS2
 |
 +- Directory
 |
 +- File
     |
     +- Raw allocated
     +- Raw sparse
     +- QCow2
     +- VMDK

Storage attributes
==================

  - Local vs network    (ext3 vs NFS, SCSI vs iSCSI)

  - Private vs shared   (IDE vs FibreChannel)

  - Pool vs volume      (LVM VG vs LV, Directory vs File, Disk vs Partition)

  - Container vs guest  (OpenVZ vs Xen)

  - Attributes
     - Compressed
     - Encrypted
     - Auto-extend

  - Snapshots 
     - RO
     - RW

  - Partition table
     - MBR
     - GPT

  - UUID
     - 16 hex digits
     - Unique string
     - SCSI WWID  (world wide ID)

  - Local Path(s)   (/dev/sda, /var/lib/xen/images/foo.img)

  - Server Hostname

  - Server Identifier (export path/target)

  - MAC security label (SELinux)

  - Redundancy
     - Mirrored
     - Striped
     - Multipath

  - Pool operation
     - RO
     - RW

Nesting hierarchy
=================

Many possibilities...

 - 1 x Host -> N x iSCSI target -> N x LUN -> N x Partition
 - N x Disk/Partition -> 1 x LVM VG -> B x LVM LV
 - 1 x Filesystem -> N x directory -> N x file
 - 1 x File -> 1 x Block (loopback)

Application users
=================

 - virt-manager / virt-install
    - Enumerate available pools
    - Allocate volume from pool
    - Create guest with volume

 - virt-clone
    - Copy disks
    - Snapshot disks

 - virt-df
    - Filesystem usage

 - pygrub
    - Extract kernel/initrd from filesystem

 - virt-factory
    - Manage storage pools
    - Pre-migration sanity checks

 - virt-backup
    - Snapshot disks

 - virt-p2v
    - Snapshot disks

Storage representation
======================

Two core concepts

 - Volume
    - a chunk of storage
    - assignable to a guest
    - assignable to a pool
    - optionally part of a pool

 - Pool
    - a chunk of storage
    - contains free space
    - allocate to provide volumes
    - compromised of volumes

Recursive! 

  n x Volume -> Pool -> n x Volume 

Nesting to many levels...

Do we need an explicit Filesystem concept ?

Operations
==========

Limited set of operations to perform

 - List host volumes   (physical attached devices)
 - List pools          (logical volume groups, partitioned devs, filesystems)
 - List pool volumes   (dev partitions, LVM logical volumes, files)

 - Define pool      (eg create directory, or define iSCSI target)
 - Undefine pool    (delete directory, undefine iSCSI config
 - Activate pool    (mount NFS volume,  login to iSCSI target)
 - Deactivate pool  (unmount volume,  logout of iSCSI)
 - Dump pool XML    (get all the metadata)
 - Lookup by path
 - Lookup by UUID
 - Lookup by name

 - Create volume    (create a file, allocate a LVM LV, etc)
 - Destroy volume   (delete a file, deallocate a LVM LV)
 - Resize volume    (grow or shrink volume)
 - Copy volume      (copy data between volumes)
 - Snapshot volume  (snapshot a volume)
 - Dump volume XML  (get all the metadata)
 - Lookup by path
 - Lookup by UUID
 - Lookup by name

 http://www.redhat.com/archives/libvir-list/2007-February/msg00010.html
 http://www.redhat.com/archives/libvir-list/2007-September/msg00119.html

Do we also need some explicit Filesystem APIs ?

XML description
===============

The horrible recursiveness & specific attributes are all in the XML 
description for different storage pool / volume types. This is where
we define things like what physical volume are in a volume group,
iSCSI server / target names, login details, etc, etc

  XXX fill in the hard stuff for metadata description here 

Implementation backends
=======================

 - FileSystem/Directory/File  - POSIX APIs
 - LVM                        - LVM tools, or libLVM 
 - Disk/partitions            - sysfs / parted
 - iSCSI                      - sysfs / iscsi utils
 - ZFS                        - ZFS tools

Implementation strategy
=======================

Should prioritize implementation according to immediate application needs

Initial goal to support remote guest creation on par with current
capabilities:

 - Directory + allocateing raw sparse files
 - Enumerate existing disks, partitions & LVM volumes

Further work:

 - Allocating LVM volumes
 - Defining LVM volume groups
 - Partitioning disks
 - Mounting networked filesystems
 - Accessing iSCSI volumes
 - Copying existing volumes
 - Snapshotting volumes
 - Cluster aware filesystems (GFS)
 - Various file formats (QCow, VMDK, etc)

Dan.
-- 
|=- Red Hat, Engineering, Emerging Technologies, Boston.  +1 978 392 2496 -=|
|=-           Perl modules: http://search.cpan.org/~danberr/              -=|
|=-               Projects: http://freshmeat.net/~danielpb/               -=|
|=-  GnuPG: 7D3B9505   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505  -=|

Daniel P. Berrange

Richard W.M. Jones

Daniel P. Berrange

Richard W.M. Jones

Daniel P. Berrange

Chris Lalancette

Daniel P. Berrange

Chris Lalancette

Mark McLoughlin

Daniel P. Berrange

Daniel P. Berrange

Daniel P. Berrange

Daniel P. Berrange

Daniel P. Berrange

tags

participants (4)