[libvirt] Using external ceph.conf for RBD pools and disks

Hi all, At the moment, RBD storage pools in libvirt must be supplied with a list of Ceph monitor addresses, using <host> elements in the pool's source definition. Ceph itself has a configuration file, and this is used by default by all Ceph command-line utilities. This file can contain the monitor addresses for the cluster, as well as a bunch of other useful options (e.g. for tuning and debugging). I think it would be nice if libvirt were able to load in this file when starting RBD storage pools. Before I send some patches through. however, I thought I'd better check to see whether my approach is sound. First, I am not keen on having libvirt get librados to load the configuration file automatically. librados actually uses a search path to find the configuration file, and that path includes silly things like the current working directory. Since it can be told to load a single file, I think it would be better if it were made explicit in the storage pool XML, i.e.: <pool type="rbd"> <name>rbd</name> <source> <name>rbd</name> <config file="/etc/ceph/ceph.conf"/> <auth username="user" type="ceph"> <secret usage="mycephcluster"/> </auth> </source> </pool> <config> would be able to be used in addition to, or as an alternative to, a list of <host> elements. Would something along these lines this be suitable? Would it be better to use the <config> element's text content as the filename, rather than use an attribute? I'm not sure what style guidelines there are for something like this. The second part is of course to make a similar change to RBD-based domain disk definitions, i.e.: ... <disk type="network"> <driver name="qemu" type="raw"/> <source protocol="rbd" name="pool/volume"> <config file="/etc/ceph/ceph.conf"/> </source> <target dev="vda" bus="virtio"/> <auth username="user"> <secret type="ceph" usage="mycephcluster"/> </auth> </disk> ... Again, <config> could be used instead of or alongside some <host> elements. This is where it gets a little tricky. At the moment, <host> in a disk's source definition is entirely optional. Furthermore, QEMU _always_ loads a Ceph configuration file -- either one supplied as a "conf" argument for the block device, or one found through the search path mentioned earlier. The only way to suppress this is to pass conf=/dev/null... but for backwards-compatibility (users may be relying on QEMU's use of the search path), I don't think we can do this now. There's one final gotcha in all of this: if QEMU is given both a "conf" argument and a "mon_addr" argument, only the latter will take effect. This means if both <config> and <host> are supplied, then the <host> elements will override any monitor addresses from the configuration file. For consistency, I intend to make an RBD storage pool have the same behaviour. However, would it perhaps be better if the user could only choose _either_ <config> or a list of <host> elements? Personally, I don't think it's a big deal if the behaviour is clearly documented -- being able to load options from a config file while still defining hosts in the libvirt XML could be useful. Anyway, before I send my patches through I'm interested in hearing people's thoughts on this. All sound sane? Too intrusive? A waste of time? :-) - Michael

On Sat, Nov 02, 2013 at 12:18:17AM +1100, Michael Chapman wrote:
Hi all,
At the moment, RBD storage pools in libvirt must be supplied with a list of Ceph monitor addresses, using <host> elements in the pool's source definition. Ceph itself has a configuration file, and this is used by default by all Ceph command-line utilities. This file can contain the monitor addresses for the cluster, as well as a bunch of other useful options (e.g. for tuning and debugging).
I think it would be nice if libvirt were able to load in this file when starting RBD storage pools. Before I send some patches through. however, I thought I'd better check to see whether my approach is sound.
First, I am not keen on having libvirt get librados to load the configuration file automatically. librados actually uses a search path to find the configuration file, and that path includes silly things like the current working directory. Since it can be told to load a single file, I think it would be better if it were made explicit in the storage pool XML, i.e.:
<pool type="rbd"> <name>rbd</name> <source> <name>rbd</name> <config file="/etc/ceph/ceph.conf"/> <auth username="user" type="ceph"> <secret usage="mycephcluster"/> </auth> </source> </pool>
<config> would be able to be used in addition to, or as an alternative to, a list of <host> elements. Would something along these lines this be suitable? Would it be better to use the <config> element's text content as the filename, rather than use an attribute? I'm not sure what style guidelines there are for something like this.
The second part is of course to make a similar change to RBD-based domain disk definitions, i.e.:
... <disk type="network"> <driver name="qemu" type="raw"/> <source protocol="rbd" name="pool/volume"> <config file="/etc/ceph/ceph.conf"/> </source> <target dev="vda" bus="virtio"/> <auth username="user"> <secret type="ceph" usage="mycephcluster"/> </auth> </disk> ...
Again, <config> could be used instead of or alongside some <host> elements.
This is where it gets a little tricky. At the moment, <host> in a disk's source definition is entirely optional. Furthermore, QEMU _always_ loads a Ceph configuration file -- either one supplied as a "conf" argument for the block device, or one found through the search path mentioned earlier. The only way to suppress this is to pass conf=/dev/null... but for backwards-compatibility (users may be relying on QEMU's use of the search path), I don't think we can do this now.
There's one final gotcha in all of this: if QEMU is given both a "conf" argument and a "mon_addr" argument, only the latter will take effect. This means if both <config> and <host> are supplied, then the <host> elements will override any monitor addresses from the configuration file.
For consistency, I intend to make an RBD storage pool have the same behaviour. However, would it perhaps be better if the user could only choose _either_ <config> or a list of <host> elements? Personally, I don't think it's a big deal if the behaviour is clearly documented -- being able to load options from a config file while still defining hosts in the libvirt XML could be useful.
Anyway, before I send my patches through I'm interested in hearing people's thoughts on this. All sound sane? Too intrusive? A waste of time? :-)
We have always taken the position that we do not want to rely on host configuration in this way. The goal of the XML configs is that they fully describe the functional setup of the resource in question. This is to ensure that if you put the same XML config on two different hosts you can be sure that they will operate in the same way. If you leave out a bunch of config information and rely on the host ceph.conf file, then you can no longer ever be sure if two hosts are configured the same way with libvirt. This is why we do not support use of the dnsmasq.conf file for configuring virtual networks, and why we disable use of the /etc/qemu configuration files for configuring guests. I don't think ceph is special here, so I'd be against relying on a external ceph.conf file too. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Fri, 1 Nov 2013, Daniel P. Berrange wrote:
We have always taken the position that we do not want to rely on host configuration in this way. The goal of the XML configs is that they fully describe the functional setup of the resource in question. This is to ensure that if you put the same XML config on two different hosts you can be sure that they will operate in the same way. If you leave out a bunch of config information and rely on the host ceph.conf file, then you can no longer ever be sure if two hosts are configured the same way with libvirt.
I suspected that might be the case -- half the reason I sent my email, really! If it's desireable to not rely on any host configuration at all, should we be explicitly be passing conf=/dev/null to QEMU when setting up a RBD device? As I mentioned before, without that QEMU will implicitly try to find a system ceph.conf file using a built-in librados search path. Would this actually be backwards-incompatible change given it was never documented by libvirt? - Michael

On 11/01/2013 08:31 AM, Michael Chapman wrote:
On Fri, 1 Nov 2013, Daniel P. Berrange wrote:
We have always taken the position that we do not want to rely on host configuration in this way. The goal of the XML configs is that they fully describe the functional setup of the resource in question. This is to ensure that if you put the same XML config on two different hosts you can be sure that they will operate in the same way. If you leave out a bunch of config information and rely on the host ceph.conf file, then you can no longer ever be sure if two hosts are configured the same way with libvirt.
I suspected that might be the case -- half the reason I sent my email, really!
If it's desireable to not rely on any host configuration at all, should we be explicitly be passing conf=/dev/null to QEMU when setting up a RBD device?
Sure sounds like it to me.
As I mentioned before, without that QEMU will implicitly try to find a system ceph.conf file using a built-in librados search path. Would this actually be backwards-incompatible change given it was never documented by libvirt?
The old behavior is broken, so we can bill this as a bug fix (previously, qemu would behave differently than what the XML defined, which is not supposed to happen) rather than a backwards-incompatible change. Can you propose a patch in time for inclusion in 1.1.4? -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Fri, 1 Nov 2013, Eric Blake wrote:
The old behavior is broken, so we can bill this as a bug fix (previously, qemu would behave differently than what the XML defined, which is not supposed to happen) rather than a backwards-incompatible change. Can you propose a patch in time for inclusion in 1.1.4?
I can hammer out a patch quickly, but I won't have a chance to run it on a Ceph-enabled test machine before Monday. It's not critical, so I suggest it goes in after 1.1.4 is released. - Michael

On 11/01/2013 07:42 AM, Eric Blake wrote:
On 11/01/2013 08:31 AM, Michael Chapman wrote:
As I mentioned before, without that QEMU will implicitly try to find a system ceph.conf file using a built-in librados search path. Would this actually be backwards-incompatible change given it was never documented by libvirt?
The old behavior is broken, so we can bill this as a bug fix (previously, qemu would behave differently than what the XML defined, which is not supposed to happen) rather than a backwards-incompatible change. Can you propose a patch in time for inclusion in 1.1.4?
This will break OpenStack's usage of libvirt + rbd in Grizzly and earlier releases, which relied on loading ceph.conf for the monitor addresses. This is fixed in OpenStack Havana, but I wanted to note that applications are relying on this behavior. Passing conf=/dev/null removes the last remaining way of specifying arbitrary ceph options for rbd devices, which is backwards-incompatible in some setups even with well-behaved applications. In general it may break setups using non-default options that libvirt is not aware of. For example, ceph has an option to require messages to be signed. This is off by default for backwards compatibility with older ceph clients, but it can be enabled for qemu right now by adding an option to /etc/ceph/ceph.conf. If libvirt passes conf=/dev/null, guests are less secure since they may get their data from an untrusted source that does not sign messages. Ceph is a fast-moving complex project, and there are many options (and will be more in the future) that affect security, performance tuning, run-time introspection, logging, etc. I don't think libvirt should remove the ability to configure these settings without having a way to add them via xml. It doesn't seem feasible to make libvirt (and all applications using it) aware of all existing and new options, especially since many of them are quite ceph-specific. Instead, I'd like to propose a mechanism for passing through generic key/value pairs to configure block devices. Concretely, this could be something like: <disk type='network'> <driver name='qemu' type='raw' cache='writeback'/> <source protocol='rbd' name='pool/image'> <host name='mon1.example.org'/> <option name="cephx require signatures" value="true"/> <option name="rbd cache size" value="131768"/> <option name="rbd cache max dirty" value="131768"/> <option name="rbd cache max dirty age" value="1.5"/> <option name="rbd balance snap reads" value="true"/> <option name="debug ms" value="0/0"/> <option name="debug auth" value="0/0"/> <option name="debug rados" value="0/0"/> </source> </disk> I don't care about the particular format, just that there's a way to set these kinds of settings. It's much easier for users of libvirt and ceph if these are treated as opaque strings by libvirt, since they can ugrade ceph and use new options without upgrading libvirt and any applications using it as well. I'm happy to provide patches if this approach is acceptable. Josh
participants (4)
-
Daniel P. Berrange
-
Eric Blake
-
Josh Durgin
-
Michael Chapman