Re: [libvirt] ZFS backend does fail if used on non top level pools

Friday, 11 May 2018

...
 This one is the "unknown" for me.  What happens if you
create
 Xzfs/images/vol1 (or your command below) without first creating  Xzfs/images?

Answer: it fails, unless you give the '-p' flag.

            -p

                Creates all the non-existing parent datasets. Datasets 
created in this  manner
                are  automatically mounted according to the mountpoint 
property inherited from
                their parent. Any property specified on the command line 
using the  -o  option
                is  ignored.  If the target filesystem already exists, 
the operation completes
                successfully.

Example: given an existing zfs pool called "zfs":

# zfs create zfs/foo/bar
cannot create 'zfs/foo/bar': parent does not exist
# zfs create -p zfs/foo/bar
# zfs list zfs/foo
NAME      USED  AVAIL  REFER  MOUNTPOINT
zfs/foo   192K  23.5G    96K  /zfs/foo
# zfs list -r zfs/foo
NAME          USED  AVAIL  REFER  MOUNTPOINT
zfs/foo       192K  23.5G    96K  /zfs/foo
zfs/foo/bar    96K  23.5G    96K  /zfs/foo/bar

However, I don't see this as a problem for libvirt.  The parent should 
already exist when you define the pool, and I expect libvirt will only 
create immediate children.

...
 If one digs into the virStorageBackendZFSBuildPool they will see
libvirt
 pool create/build processing would "zpool create $name $path[0...n]"
 where $name is the "source.name" (in your case Xzfs/images) and
 $path[0...n] would be the various paths (in your case tmp/Xzfs) 
Just to be clear, creating a zpool ("zpool create") is different to 
creating a zfs dataset ("zfs create").

By analogy to LVM: a zpool is like a volume group, and a zfs 
dataset/zvol is like a logical volume.

A zpool (or VG) is created from a collection of block devices - or 
something which looks like a block device, e.g. a partition or a 
loopback-mounted file.  Those are $path[0...n] in the above, and would 
be called "physical volumes" in LVM.

"zfs create" then creates a dataset (filesystem) or zvol (block device) 
which draws space out of the zpool.  The analogous operation in LVM is 
"lvcreate", although it will only give you a block device - it's up to 
you to make a filesystem within it.

In summary:

     zpool create ==> vgcreate (*)
     zfs create -V ==> lvcreate

(*) LVM also requires you to label the block devices with "pvcreate" 
before you can add them to a volume group. zpool create doesn't require 
this.

 From my point of view, as a libvirt user: I *could* dedicate an entire 
zpool to libvirt, but I don't want to.  It would mean libvirt has full 
ownership of that set of physical disks, and I may want to use the space 
for other things as well.

What I want to do is to allow libvirt to use an existing zpool, with a 
parent dataset which it can allocate underneath, like this:

zfs create zfs/libvirt
virsh pool-define-as --name zfs --source-name zfs/libvirt --type zfs

(instead of using pool-create/pool-build).  This not only makes it clear 
which datasets belong to libvirt, but allows me to do things like 
storage accounting at the parent dataset level.

And actually, this almost works. It's just the pool refresh which fails, 
because it tries to treat "zfs/images" as if it were a zpool.  Stripping 
off everything up to the first slash for "zpool get" would fix this.

Arguably this uncovers a couple of other related issues to do with error 
handling:

- to the end user, "virsh pool-refresh" silently appears to work (unless 
you dig down deep into logs), even though the underlying "zpool get" 
returns with an error

- by this stage, pool-refresh has already destroyed all existing libvirt 
volumes which were previously in the pool

Regards,

Brian.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005