Here is an updated document about storage which attempts to formulate the
various pieces of metadata in an XML representation. In addition I have
introduced a 3rd concept of a 'device'. A device represents any physical
device attached ato a host, be it a disk, a sound card, a USB gizmo, or
anything else you would see with 'lshal' (or in sysfs). I considered
Mark's suggestion that we have a 'host pool' in which physical storage
devices live, but I think it is important to directly represent the
physical devices as a concept separately from pools & volumes. This is
because we need this in other areas - we we need network device info when
setting up networking in guests & virtual networks. We need USB & PCI
device info todo device pass-through to the guest, etc. Finally also
included more info about permissions & security.
Taxonomy of storage types
=========================
|
+- Block
| |
| +- Disk
| | |
| | +- Direct attached
| | | |
| | | +- IDE/ATA disk
| | | +- SCSI disk
| | | +- FibreChannel disk
| | | +- USB disk/flash
| | | +- FireWire disk/flash
| | |
| | +- Remote attached
| | |
| | +- iSCSI disk
| | +- GNBD disk
| |
| +- Partition
| |
| +- Virtual
| |
| +- Direct attached
| | |
| | +- LVM
| | +- ZFS
| |
| +- Remote attached
| |
| +- Cluster LVM
|
+- FileSystem
| |
| +- Directed attached
| | |
| | +- ext2/3/4
| | +- xfs
| | +- ZFS
| |
| +- Remote attached
| |
| +- NFS
| +- GFS
| +- OCFS2
|
+- Directory
|
+- File
|
+- Raw allocated
+- Raw sparse
+- QCow2
+- VMDK
Storage attributes
==================
- Local vs network (ext3 vs NFS, SCSI vs iSCSI)
- Private vs shared (IDE vs FibreChannel)
- Pool vs volume (LVM VG vs LV, Directory vs File, Disk vs Partition)
- Container vs guest (OpenVZ vs Xen)
- Attributes
- Compressed
- Encrypted
- Auto-extend
- Snapshots
- RO
- RW
- Partition table
- MBR
- GPT
- UUID
- 16 hex digits
- Unique string
- SCSI WWID (world wide ID)
- Local Path(s)
- Server Hostname
- Server Identifier (export path/target)
- MAC security label (SELinux)
- Redundancy (mirrored/striped/multipath)
- Pool operation
- RO
- RW
- Authentication
- Username / Password
- Client IP/MAC address
- Kerberos / GSSAPI
- Passphrase
Nesting hierarchy
=================
- 1 x Host -> N x iSCSI target -> N x LUN -> N x Partition
- N x Disk/Partition -> 1 x LVM VG -> B x LVM LV
- 1 x Filesystem -> N x directory -> N x file
- 1 x File -> 1 x Block (loopback)
Application users
=================
- virt-manager / virt-install
- Enumerate available pools
- Allocate volume from pool
- Create guest with volume
- virt-clone
- Copy disks
- Snapshot disks
- virt-df
- Filesystem usage
- pygrub
- Extract kernel/initrd from filesystem
- virt-factory
- Manage storage pools
- Pre-migration sanity checks
- virt-backup
- Snapshot disks
- virt-p2v
- Snapshot disks
Storage representation
======================
Three core concepts
- Device
- a physical device attached to a host
- associated with a bus / subsystem (scsi,usb,ide,etc)
- bus specific identifier (vendor+product ID?)
- a driver type
- unique id / serial number
- device name for its current mapping into the filesystem
- Pool
- a pool of storage
- contains free space
- allocate to provide volumes
- compromised of devices, or a remote server
- Volume
- a chunk of storage
- assignable to a guest
- part of a pool
XML description
===============
Storage pools
-------------
High level
- Type - the representation of the storage pool
- Source - the underlying data storage location
- Target - mapping to local filesystem (if applicable)
The XML only provides information that describes the pool itself.
ie, information about phyiscal devices underlying the pool is not
not maintained here.
- A directory within a filesystem
<pool type="dir">
<name="xenimages"/>
<uuid>12345678-1234-1234-1234-123456781234</uuid>
<target file="/var/lib/xen/images"/>
<permissions>
<mode>0700</mode>
<owner>root</owner>
<group>virt</group>
<label>xen_image_t</label>
</permissions>
</pool>
- A dedicated filesystem
<pool type="fs">
<name="xenimages"/>
<uuid>12345678-1234-1234-1234-123456781234</uuid>
<source dev="/dev/sda1"/>
<target file="/var/lib/xen/images"/>
<permissions>
<mode>0700</mode>
<owner>root</owner>
<group>virt</group>
<label>xen_image_t</label>
</permissions>
</pool>
- A dedicated disk
<pool type="disk">
<name="xenimages"/>
<uuid>12345678-1234-1234-1234-123456781234</uuid>
<source dev="/dev/sda"/>
<permissions>
<mode>0700</mode>
<owner>root</owner>
<group>virt</group>
<label>xen_image_t</label>
</permissions>
</pool>
- A logical volume group with 3 physical volumes
<pool type="lvm">
<name="xenimages"/>
<uuid>12345678-1234-1234-1234-123456781234</uuid>
<source dev="/dev/sda1"/>
<source dev="/dev/sdb1"/>
<source dev="/dev/sdc1"/>
<target dev="/dev/VirtVG"/>
</pool>
- A network filesystem
<pool type="nfs">
<name="xenimages"/>
<uuid>12345678-1234-1234-1234-123456781234</uuid>
<source host="someserver" export="/vol/files">
<auth type="kerberos" keytab="/etc/server.tab"/>
</source>
<target file="/var/lib/xen/images"/>
<permissions>
<mode>0700</mode>
<owner>root</owner>
<group>virt</group>
<label>xen_image_t</label>
</permissions>
</pool>
- An iSCSI target
<pool type="iscsi">
<name="xenimages"/>
<uuid>12345678-1234-1234-1234-123456781234</uuid>
<source host="someserver" export="sometarget">
<auth type="chap" username="joe"
password="123456"/>
</source>
</pool>
XXX Some kind of indiciation as to whether a pool allows
creation of new volumes, or merely use of existing ones
XXX flag for whether volumes will be file or block based
XXX capacity / usage information if available
XXX indicate whether pool can be activated/deactived, vs
permanently in active state
Storage volumes
---------------
High level
- Unique name within pool
- Data format type (qcow, raw, vmdk)
- FS / Dir / NFS volume
<volume type="file">
<name>foo</name>
<format type="qcow2">
<encrypt key="123456"/>
<compress/>
</format>
<capacity>1000000</capacity>
<allocation>100</allocation>
<permissions>
<mode>0700</mode>
<owner>root</owner>
<group>virt</group>
<label>xen_image_t</label>
</permissions>
<target file="/var/lib/xen/images/foo.img"/>
</volume>
- iSCSI / LVM / Partition
<volume type="block">
<name>foo</name>
<capacity>1000000</capacity>
<allocation>100</allocation>
<permissions>
<mode>0700</mode>
<owner>root</owner>
<group>virt</group>
<label>xen_image_t</label>
</permissions>
<target dev="/dev/HostVG/foo"/>
<snapshots>
<snapshot name="bar"/>
</snapshots>
<volume>
XXX VMWare's VMDK can be made up of many chained files
XXX QCow stores snapshots internally, with a name, while
LVM stores them as separate volumes with a link. Listing
snapshots along side master volume seems allows both to
be represented.
XXX flag to indiciate whether it is resizeable ?
Host devices
------------
This is not just limited to storage devices. Basically a representation
of the same data provided by HAL (cf lshal)
- Opaque name, vendor & product strings
- Subsystem specific unique identifier for vendor/product/model
- Capability type, eg storage, sound, network, etc
<device>
<name>/org/freedesktop/Hal/devices/volume_part2_size_99920701440</name>
<vendor name="Some Vendor"/>
<product name="Some Disk"/>
<subsystem type="usb">
<product id="2345"/>
<vendor id="2345"/>
</subsystem>
<class type="storage">
<block dev="/dev/
<bus type="ide"/>
<drive type="cdrom"/>
</capability>
</device>
NB, 'class' sort of maps to HAL's 'capability' field. Though HAL
allows
for multiple capabilities per device.
Operations
==========
Limited set of operations to perform
For devices:
- List devices
- List devices by class
- Lookup by path
- Lookup by name
For pools:
- List pools (logical volume groups, partitioned devs, filesystems)
- Define pool (eg create directory, or define iSCSI target)
- Undefine pool
- Activate pool (mount NFS volume, login to iSCSI target)
- Deactivate pool
- Dump pool XML
- Lookup by path
- Lookup by UUID
- Lookup by name
For volumes
- List volumes (takes a pool as a param)
- Create volume (takes a pool as a param)
- Destroy volume
- Resize volume
- Copy volume
- Snapshot volume
- Dump volume XML
- Lookup by path
- Lookup by UUID
- Lookup by name
http://www.redhat.com/archives/libvir-list/2007-February/msg00010.html
http://www.redhat.com/archives/libvir-list/2007-September/msg00119.html
Implementation
==============
- devices - sysfs / HAL
- FileSystem/Directory/File - POSIX APIs
- LVM - lvm tools
- Disk/partitions - sysfs / HAL / parted
- iSCSI - sysfs / HAL / iscsi utils
- ZFS - ZFS tools
NB, HAL gets all its info from sysfs. So can choose to use HAL, or go
directly to sysfs. The former is more easily portable to Solaris, but
does required the software dependancy stack to include HAL, DBus, ConsoleKit,
PolicyKit and GLib. Already have DBus dep via Avahi.
Dan.
--
|=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=|
|=- Perl modules:
http://search.cpan.org/~danberr/ -=|
|=- Projects:
http://freshmeat.net/~danielpb/ -=|
|=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|