Re: [libvirt] [Qemu-devel] live snapshot wiki updated

[adding the libvir-list] On 07/19/2011 08:09 AM, Jes Sorensen wrote:
On 07/19/11 15:58, Eric Blake wrote:
On 07/19/2011 07:27 AM, Jes Sorensen wrote:
Eric, what happens if libvirt in an selinux environment tells QEMU to launch using an image file that is backed by backing file(s)?
Before starting qemu, libvirt first parses all the image files, to see if any of them have backing images. For every qcow2 or qed image with a backing file, libvirt sets the SELinux context of both the qcow2 image and its backing file so that qemu will be able to successfully open() them. But if any of those files reside on NFS, then it is not possible to label individual files, so it requires setting the SELinux bool virt_use_nfs, which thus gives qemu the power to open() arbitrary files on NFS, and you've lost security.
Urgh, libvirt parsing image files is really unfortunate, it really doesn't give me warm fuzzy feelings :( libvirt really should not know about internals of image formats.
But even if you add new features to qemu to avoid needing this in the future, it doesn't change the past - libvirt will always have to know how to parse image files understood by older qemu, and so as long as libvirt already knows how to do that parsing, we might as well take advantage of it. Besides, I feel that having a well-documented file format, so that independent applications can both parse the same file with the same semantics by obeying the file format specification, is a good design goal.
It would be nice if libvirt had a way to pass fds for every disk and backing file up front; then, SELinux can work around the lack of NFS per-file labelling by blocking open() in qemu. In fact, this has already been proposed:
A cleaner solution seems to have libvirt provide a call-back allowing QEMU to call out and have libvirt open a file descriptor instead. This way libvirt can validate it and open it for QEMU and pass it back.
Yes, that could probably be made to work with libvirt.
If we cannot do something like this, I would prefer to have backing files on NFS should simply not be supported when running in an selinux setup.
As nice as that sentiment is, it will never fly, because it would be a regression in current behavior. The whole reason that the virt_use_nfs SELinux bool exists is that some people are willing to make the partial security tradeoff. Besides, the use of sVirt via SELinux is more than just open() protection - while the current virt_use_nfs bool makes NFS less secure than otherwise possible, it still gives some nice guarantees to the rest of the qemu process such as passthrough accesses to local pci devices. Just because it is currently not as secure to mix NFS shared storage with backing files doesn't stop some people from wanting to do it [in fact, that's my current development setup - I use qcow2 images on NFS shared storage, keep SELinux enabled, and enable the virt_use_nfs bool]. This discussion is about adding enhancements that make SELinux even more powerful when using NFS shared storage, by adding fd passing (whether libvirt parses in advance, or whether qemu raises an event and requires feedback from libvirt), and not about crippling the existing capability to use the virt_use_nfs selinux bool. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On 07/19/11 16:24, Eric Blake wrote:
[adding the libvir-list] On 07/19/2011 08:09 AM, Jes Sorensen wrote:
Urgh, libvirt parsing image files is really unfortunate, it really doesn't give me warm fuzzy feelings :( libvirt really should not know about internals of image formats.
But even if you add new features to qemu to avoid needing this in the future, it doesn't change the past - libvirt will always have to know how to parse image files understood by older qemu, and so as long as libvirt already knows how to do that parsing, we might as well take advantage of it.
What has been done here in the past is plain wrong. Continuing to do it isn't the right thing to do here.
Besides, I feel that having a well-documented file format, so that independent applications can both parse the same file with the same semantics by obeying the file format specification, is a good design goal.
We all know that documentation is rarely uptodate, new features may not get added and libvirt will never be able to keep up. The driver for a file format belongs in QEMU and nowhere else.
It would be nice if libvirt had a way to pass fds for every disk and backing file up front; then, SELinux can work around the lack of NFS per-file labelling by blocking open() in qemu. In fact, this has already been proposed:
A cleaner solution seems to have libvirt provide a call-back allowing QEMU to call out and have libvirt open a file descriptor instead. This way libvirt can validate it and open it for QEMU and pass it back.
Yes, that could probably be made to work with libvirt.
I am a little frustrated this approach wasn't taken up front instead of the evil hack of having libvirt attempt to parse image files.
If we cannot do something like this, I would prefer to have backing files on NFS should simply not be supported when running in an selinux setup.
As nice as that sentiment is, it will never fly, because it would be a regression in current behavior. The whole reason that the virt_use_nfs SELinux bool exists is that some people are willing to make the partial security tradeoff. Besides, the use of sVirt via SELinux is more than just open() protection - while the current virt_use_nfs bool makes NFS less secure than otherwise possible, it still gives some nice guarantees to the rest of the qemu process such as passthrough accesses to local pci devices.
Well leaving things at status quo is not making it worse, it just leaves an evil in place.
Just because it is currently not as secure to mix NFS shared storage with backing files doesn't stop some people from wanting to do it [in fact, that's my current development setup - I use qcow2 images on NFS shared storage, keep SELinux enabled, and enable the virt_use_nfs bool]. This discussion is about adding enhancements that make SELinux even more powerful when using NFS shared storage, by adding fd passing (whether libvirt parses in advance, or whether qemu raises an event and requires feedback from libvirt), and not about crippling the existing capability to use the virt_use_nfs selinux bool.
I do not believe we should try and add extra interfaces to support something which is inherently broken. This really boils down to whether we should support fd passing for snapshots in the first place. If it is to support the broken setup of libvirt parsing image files, then I am totally against it, if we work on a proper solution that involves this in some way, then we can discuss it. Cheers, Jes

On Tue, Jul 19, 2011 at 3:30 PM, Jes Sorensen <Jes.Sorensen@redhat.com> wrote:
On 07/19/11 16:24, Eric Blake wrote:
[adding the libvir-list] On 07/19/2011 08:09 AM, Jes Sorensen wrote:
Urgh, libvirt parsing image files is really unfortunate, it really doesn't give me warm fuzzy feelings :( libvirt really should not know about internals of image formats.
But even if you add new features to qemu to avoid needing this in the future, it doesn't change the past - libvirt will always have to know how to parse image files understood by older qemu, and so as long as libvirt already knows how to do that parsing, we might as well take advantage of it.
What has been done here in the past is plain wrong. Continuing to do it isn't the right thing to do here.
Besides, I feel that having a well-documented file format, so that independent applications can both parse the same file with the same semantics by obeying the file format specification, is a good design goal.
We all know that documentation is rarely uptodate, new features may not get added and libvirt will never be able to keep up. The driver for a file format belongs in QEMU and nowhere else.
It should be a goal to avoid dependencies in multiple layers of the stack because it becomes are to add new features - they require coordinated changes in multiple layers. Having both QEMU and libvirt know the internals of image files is such a multi-dependency. If I want to add a new format or change an existing format I have to touch both layers. For fd-passing perhaps we have an opportunity to use a callback mechanism (QEMU request: filename -> libvirt response: fd) and do all the image format parsing in QEMU. Stefan

On Tue, Jul 19, 2011 at 04:14:27PM +0100, Stefan Hajnoczi wrote:
On Tue, Jul 19, 2011 at 3:30 PM, Jes Sorensen <Jes.Sorensen@redhat.com> wrote:
On 07/19/11 16:24, Eric Blake wrote:
[adding the libvir-list] On 07/19/2011 08:09 AM, Jes Sorensen wrote:
Urgh, libvirt parsing image files is really unfortunate, it really doesn't give me warm fuzzy feelings :( libvirt really should not know about internals of image formats.
But even if you add new features to qemu to avoid needing this in the future, it doesn't change the past - libvirt will always have to know how to parse image files understood by older qemu, and so as long as libvirt already knows how to do that parsing, we might as well take advantage of it.
What has been done here in the past is plain wrong. Continuing to do it isn't the right thing to do here.
Besides, I feel that having a well-documented file format, so that independent applications can both parse the same file with the same semantics by obeying the file format specification, is a good design goal.
We all know that documentation is rarely uptodate, new features may not get added and libvirt will never be able to keep up. The driver for a file format belongs in QEMU and nowhere else.
It should be a goal to avoid dependencies in multiple layers of the stack because it becomes are to add new features - they require coordinated changes in multiple layers. Having both QEMU and libvirt know the internals of image files is such a multi-dependency. If I want to add a new format or change an existing format I have to touch both layers.
For fd-passing perhaps we have an opportunity to use a callback mechanism (QEMU request: filename -> libvirt response: fd) and do all the image format parsing in QEMU.
The reason why libvirt does the parsing of file headers to determine backing files is to maintain the trust boundary. Everything run from the exec() of QEMU onwards is considered untrusted code. So having QEMU parsing the file headers & passing back open() requests to libvirt is breaking the trust boundary. NB, i'm not happy about libvirt having to have knowledge of file format headers, but we needed something more efficient & reliable than invoking qemu-img info & parsing the output. Ideally QEMU (or something else) would provide a library libblockformat.so with stable APIs for at least reading metadata about image formats. If it had APIs for image creation, etc too that would be a bonus, but we're more or less ok spawning qemu-img for those cases currently. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

"Daniel P. Berrange" <berrange@redhat.com> writes:
On Tue, Jul 19, 2011 at 04:14:27PM +0100, Stefan Hajnoczi wrote:
On Tue, Jul 19, 2011 at 3:30 PM, Jes Sorensen <Jes.Sorensen@redhat.com> wrote:
On 07/19/11 16:24, Eric Blake wrote:
[adding the libvir-list] On 07/19/2011 08:09 AM, Jes Sorensen wrote:
Urgh, libvirt parsing image files is really unfortunate, it really doesn't give me warm fuzzy feelings :( libvirt really should not know about internals of image formats.
But even if you add new features to qemu to avoid needing this in the future, it doesn't change the past - libvirt will always have to know how to parse image files understood by older qemu, and so as long as libvirt already knows how to do that parsing, we might as well take advantage of it.
What has been done here in the past is plain wrong. Continuing to do it isn't the right thing to do here.
Besides, I feel that having a well-documented file format, so that independent applications can both parse the same file with the same semantics by obeying the file format specification, is a good design goal.
We all know that documentation is rarely uptodate, new features may not get added and libvirt will never be able to keep up. The driver for a file format belongs in QEMU and nowhere else.
It should be a goal to avoid dependencies in multiple layers of the stack because it becomes are to add new features - they require coordinated changes in multiple layers. Having both QEMU and libvirt know the internals of image files is such a multi-dependency. If I want to add a new format or change an existing format I have to touch both layers.
For fd-passing perhaps we have an opportunity to use a callback mechanism (QEMU request: filename -> libvirt response: fd) and do all the image format parsing in QEMU.
The reason why libvirt does the parsing of file headers to determine backing files is to maintain the trust boundary. Everything run from the exec() of QEMU onwards is considered untrusted code. So having QEMU parsing the file headers & passing back open() requests to libvirt is breaking the trust boundary.
Exactly. The block drivers form a tree. Each driver opens its children. For convenience, some of them have information on how to open their children encoded in their image (e.g. COW backing images). Others receive it as configuration, encoded in their "filename" string (e.g. blkdebug). Thus, we have direct control only over the root block driver. We can pass it fds easily. We control the non-root block drivers only indirectly, through the block drivers along the path from the root. We don't have a way to pass them fds. If we had a way to construct the tree bottom-up, we could directly control all the block drivers. Requires knowing the number of children, and extracting child configuration from images where applicable. [...]

On 07/19/11 18:46, Daniel P. Berrange wrote:
On Tue, Jul 19, 2011 at 04:14:27PM +0100, Stefan Hajnoczi wrote:
For fd-passing perhaps we have an opportunity to use a callback mechanism (QEMU request: filename -> libvirt response: fd) and do all the image format parsing in QEMU.
The reason why libvirt does the parsing of file headers to determine backing files is to maintain the trust boundary. Everything run from the exec() of QEMU onwards is considered untrusted code. So having QEMU parsing the file headers & passing back open() requests to libvirt is breaking the trust boundary.
Pardon, but I fail to see the issue here. If QEMU passes a filename back to libvirt, libvirt still gets to make the decision whether or not it is legitimate for QEMU to get that file descriptor or not. It doesn't change anything wrt who actually opens the file, hence the 'trust' is unchanged.
NB, i'm not happy about libvirt having to have knowledge of file format headers, but we needed something more efficient & reliable than invoking qemu-img info & parsing the output. Ideally QEMU (or something else) would provide a library libblockformat.so with stable APIs for at least reading metadata about image formats. If it had APIs for image creation, etc too that would be a bonus, but we're more or less ok spawning qemu-img for those cases currently.
Even having a library for libvirt to link against is suboptimal here. Two processes shouldn't be fighting over the internals of metadata, the ownership of the metadata belongs solely with QEMU. In addition you have the constant issue of dependencies there, hence if QEMU is updated and it provides a newer block format library, it may prevent libvirt from running forcing an update of libvirt as well. That is not acceptable for development. Cheers, Jes

On Wed, Jul 20, 2011 at 10:23:12AM +0200, Jes Sorensen wrote:
On 07/19/11 18:46, Daniel P. Berrange wrote:
On Tue, Jul 19, 2011 at 04:14:27PM +0100, Stefan Hajnoczi wrote:
For fd-passing perhaps we have an opportunity to use a callback mechanism (QEMU request: filename -> libvirt response: fd) and do all the image format parsing in QEMU.
The reason why libvirt does the parsing of file headers to determine backing files is to maintain the trust boundary. Everything run from the exec() of QEMU onwards is considered untrusted code. So having QEMU parsing the file headers & passing back open() requests to libvirt is breaking the trust boundary.
Pardon, but I fail to see the issue here. If QEMU passes a filename back to libvirt, libvirt still gets to make the decision whether or not it is legitimate for QEMU to get that file descriptor or not. It doesn't change anything wrt who actually opens the file, hence the 'trust' is unchanged.
To make the decision whether the filename from QEMU is valid, we have to parse the master image header data to see if the filename actually matches the backing file required by the image assigned to the guest.
NB, i'm not happy about libvirt having to have knowledge of file format headers, but we needed something more efficient & reliable than invoking qemu-img info & parsing the output. Ideally QEMU (or something else) would provide a library libblockformat.so with stable APIs for at least reading metadata about image formats. If it had APIs for image creation, etc too that would be a bonus, but we're more or less ok spawning qemu-img for those cases currently.
Even having a library for libvirt to link against is suboptimal here. Two processes shouldn't be fighting over the internals of metadata, the ownership of the metadata belongs solely with QEMU. In addition you have the constant issue of dependencies there, hence if QEMU is updated and it provides a newer block format library, it may prevent libvirt from running forcing an update of libvirt as well. That is not acceptable for development.
We're not fighting over the internals of metadata. We just need to know ahead of launching QEMU, what backing files an image has & what format they are in. We do that by reading at the metadata headers of the disk images. We never attempt to write to the disk images. Either someone provides a library todo that, or we write the probing code for each file format in libvirt. Currently we have the latter. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

The 20/07/11, Daniel P. Berrange wrote:
To make the decision whether the filename from QEMU is valid, we have to parse the master image header data to see if the filename actually matches the backing file required by the image assigned to the guest.
Actually, libvirt should not have to worry if the filename provided by QEMU is valid. I think it should trust QEMU. If QEMU doesn't provide information others can trust; it should be fixed at QEMU side.
We're not fighting over the internals of metadata. We just need to know ahead of launching QEMU, what backing files an image has & what format they are in. We do that by reading at the metadata headers of the disk images. We never attempt to write to the disk images. Either someone provides a library todo that, or we write the probing code for each file format in libvirt. Currently we have the latter.
This is what I would call "fighting with QEMU internals". How do you prevent from concurrency access and modifications? Ideally speacking libvirt should be able to co-exist with foreign implementations, all requesting QEMU. -- Nicolas Sebrecht

On Wed, Jul 20, 2011 at 12:15:02PM +0200, Nicolas Sebrecht wrote:
The 20/07/11, Daniel P. Berrange wrote:
To make the decision whether the filename from QEMU is valid, we have to parse the master image header data to see if the filename actually matches the backing file required by the image assigned to the guest.
Actually, libvirt should not have to worry if the filename provided by QEMU is valid. I think it should trust QEMU. If QEMU doesn't provide information others can trust; it should be fixed at QEMU side.
The security goal of libvirt is to protect the host from a compromised QEMU, therefore QEMU is, by definition, untrusted.
We're not fighting over the internals of metadata. We just need to know ahead of launching QEMU, what backing files an image has & what format they are in. We do that by reading at the metadata headers of the disk images. We never attempt to write to the disk images. Either someone provides a library todo that, or we write the probing code for each file format in libvirt. Currently we have the latter.
This is what I would call "fighting with QEMU internals". How do you prevent from concurrency access and modifications? Ideally speacking libvirt should be able to co-exist with foreign implementations, all requesting QEMU.
QEMU is *not* yet running at the time libvirt reads the file metadata. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Wed, Jul 20, 2011 at 11:28 AM, Daniel P. Berrange <berrange@redhat.com> wrote:
On Wed, Jul 20, 2011 at 12:15:02PM +0200, Nicolas Sebrecht wrote:
The 20/07/11, Daniel P. Berrange wrote:
To make the decision whether the filename from QEMU is valid, we have to parse the master image header data to see if the filename actually matches the backing file required by the image assigned to the guest.
Actually, libvirt should not have to worry if the filename provided by QEMU is valid. I think it should trust QEMU. If QEMU doesn't provide information others can trust; it should be fixed at QEMU side.
The security goal of libvirt is to protect the host from a compromised QEMU, therefore QEMU is, by definition, untrusted.
This is a very reasonable goal. QEMU is constantly dealing with the untrusted guest. The whole point of SELinux isolation of QEMU is to contain any compromise to a single VM and reduce the capabilities of that process to the minimum. libvirt needs to help set the boundaries of what the QEMU process can do. Stefan

On 07/20/11 12:28, Daniel P. Berrange wrote:
On Wed, Jul 20, 2011 at 12:15:02PM +0200, Nicolas Sebrecht wrote:
Actually, libvirt should not have to worry if the filename provided by QEMU is valid. I think it should trust QEMU. If QEMU doesn't provide information others can trust; it should be fixed at QEMU side.
The security goal of libvirt is to protect the host from a compromised QEMU, therefore QEMU is, by definition, untrusted.
Well that part goes both ways. By applying this model you are going to make it a hard requirement for libvirt to be upgraded with QEMU even for smaller updates.
We're not fighting over the internals of metadata. We just need to know ahead of launching QEMU, what backing files an image has & what format they are in. We do that by reading at the metadata headers of the disk images. We never attempt to write to the disk images. Either someone provides a library todo that, or we write the probing code for each file format in libvirt. Currently we have the latter.
This is what I would call "fighting with QEMU internals". How do you prevent from concurrency access and modifications? Ideally speacking libvirt should be able to co-exist with foreign implementations, all requesting QEMU.
QEMU is *not* yet running at the time libvirt reads the file metadata.
Of course it is: hotplug Jes

On Thu, Jul 21, 2011 at 10:40:48AM +0200, Jes Sorensen wrote:
On 07/20/11 12:28, Daniel P. Berrange wrote:
On Wed, Jul 20, 2011 at 12:15:02PM +0200, Nicolas Sebrecht wrote:
Actually, libvirt should not have to worry if the filename provided by QEMU is valid. I think it should trust QEMU. If QEMU doesn't provide information others can trust; it should be fixed at QEMU side.
The security goal of libvirt is to protect the host from a compromised QEMU, therefore QEMU is, by definition, untrusted.
Well that part goes both ways. By applying this model you are going to make it a hard requirement for libvirt to be upgraded with QEMU even for smaller updates.
We're not fighting over the internals of metadata. We just need to know ahead of launching QEMU, what backing files an image has & what format they are in. We do that by reading at the metadata headers of the disk images. We never attempt to write to the disk images. Either someone provides a library todo that, or we write the probing code for each file format in libvirt. Currently we have the latter.
This is what I would call "fighting with QEMU internals". How do you prevent from concurrency access and modifications? Ideally speacking libvirt should be able to co-exist with foreign implementations, all requesting QEMU.
QEMU is *not* yet running at the time libvirt reads the file metadata.
Of course it is: hotplug
In the case of hotplug, the hotplug monitor commands have not yet been invoked when libvirt access the file metadata, or the drive has already been unplugged, so there is still no concurrent access to the file by QEMU and libvirt. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 07/21/11 11:34, Daniel P. Berrange wrote:
QEMU is *not* yet running at the time libvirt reads the file metadata.
Of course it is: hotplug In the case of hotplug, the hotplug monitor commands have not yet been invoked when libvirt access the file metadata, or the drive has already been unplugged, so there is still no concurrent access to the file by QEMU and libvirt.
Unless someone tries to hotplug an image that relies on a backing file already used by another image. While unlikely, it is possible. Jes

On 07/21/2011 03:34 AM, Jes Sorensen wrote:
On 07/21/11 11:34, Daniel P. Berrange wrote:
QEMU is *not* yet running at the time libvirt reads the file metadata.
Of course it is: hotplug In the case of hotplug, the hotplug monitor commands have not yet been invoked when libvirt access the file metadata, or the drive has already been unplugged, so there is still no concurrent access to the file by QEMU and libvirt.
Unless someone tries to hotplug an image that relies on a backing file already used by another image. While unlikely, it is possible.
Backing images should be treated as read-only by qemu, right? That is, if I have file c.img which uses ca.img as its backing file, then qemu only needs O_RDONLY access to ca.img, right? My understanding is that you never want to have a qcow2 image whose backing file is being simultaneously edited by another external process, otherwise you risk data corruption from the point of view of the qemu process that is trying to refer to that backing file. Given that restriction, if I want to hotplug file d.img that _also_ uses ca.img as its backing file, then that's not an issue. Libvirt _still_ reads the metadata of d.img, and learns of the backing file of ca.img, all before calling the hotplug command, even if ca.img is already in use by qemu, with no complications. You still haven't managed to convince me that there is ever any situation where qemu needs to open a backing file that is not already determined /a priori/ by reading just the metadata of the files being handed to qemu in the first place, or by being aware of the metadata relationship being requested of qemu in the first place (such as the relationship implied by calling snapshot_blkdev to create a qcow2 file with an existing file as backing). -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On 07/21/2011 02:40 AM, Jes Sorensen wrote:
The security goal of libvirt is to protect the host from a compromised QEMU, therefore QEMU is, by definition, untrusted.
Well that part goes both ways. By applying this model you are going to make it a hard requirement for libvirt to be upgraded with QEMU even for smaller updates.
Right now, libvirt only needs the name of the backing file and type. How often has the qcow2 file format changed in such a way that that particular information has been incompatibly moved around, so that clients have to learn a new file format in order to parse the information in its new location? The set of information that libvirt needs out of a qcow2 image is relatively small and stable, compared to any other changes being made in the rest of the file format, and the libvirt folks are more than welcome to help review any qcow2 format changes for stability impacts. Furthermore, you are forgetting that libvirt already ends up upgrading to pick up new qemu features all the time, not just for qcow2 layout changes. If you are okay with the feature set supported by an older qemu, then you are also okay using an older libvirt that targets just that feature set - you only have to upgrade libvirt when talking to a newer qemu that requires the use of a newer feature set. Older libvirt can generally talk to newer qemu if qemu made backwards-compatible changes. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On 07/21/11 15:47, Eric Blake wrote:
On 07/21/2011 02:40 AM, Jes Sorensen wrote:
The security goal of libvirt is to protect the host from a compromised QEMU, therefore QEMU is, by definition, untrusted.
Well that part goes both ways. By applying this model you are going to make it a hard requirement for libvirt to be upgraded with QEMU even for smaller updates.
Right now, libvirt only needs the name of the backing file and type. How often has the qcow2 file format changed in such a way that that particular information has been incompatibly moved around, so that clients have to learn a new file format in order to parse the information in its new location? The set of information that libvirt needs out of a qcow2 image is relatively small and stable, compared to any other changes being made in the rest of the file format, and the libvirt folks are more than welcome to help review any qcow2 format changes for stability impacts.
QED, QCOW3 either way, libvirt should be able to boot a guest with an upgraded QEMU, even if it doesn't support all the features. In this case you are going to provide a hard requirement on libvirt being upgraded in sync.
Furthermore, you are forgetting that libvirt already ends up upgrading to pick up new qemu features all the time, not just for qcow2 layout changes. If you are okay with the feature set supported by an older qemu, then you are also okay using an older libvirt that targets just that feature set - you only have to upgrade libvirt when talking to a newer qemu that requires the use of a newer feature set. Older libvirt can generally talk to newer qemu if qemu made backwards-compatible changes.
There is a difference between upgrading to pick up additional features and forced upgrade. It is not just a question of libvirt possibly reviewing a file format, it is also having two codebases that needs to be implemented and maintained. Jes

On 07/21/2011 08:19 AM, Jes Sorensen wrote:
There is a difference between upgrading to pick up additional features and forced upgrade.
It is not just a question of libvirt possibly reviewing a file format, it is also having two codebases that needs to be implemented and maintained.
Then give us a libblockid.so that provides a stable interface, and which is also used by qemu to manage QED, qcow3, ..., so that libvirt can rely on the stable library interface rather than having to learn new file formats all the time. But yes, we already had to upgrade libvirt code to parse qed formats, and anyone wanting to use qed via libvirt has to do a lockstep upgrade of both qemu and libvirt, because no one has yet written a nice shared library that can do it on our behalf through a stable API. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On 07/20/11 11:36, Daniel P. Berrange wrote:
On Wed, Jul 20, 2011 at 10:23:12AM +0200, Jes Sorensen wrote:
Pardon, but I fail to see the issue here. If QEMU passes a filename back to libvirt, libvirt still gets to make the decision whether or not it is legitimate for QEMU to get that file descriptor or not. It doesn't change anything wrt who actually opens the file, hence the 'trust' is unchanged.
To make the decision whether the filename from QEMU is valid, we have to parse the master image header data to see if the filename actually matches the backing file required by the image assigned to the guest.
Sorry but that doesn't make any sense. In other words, if someone hacks an image and makes it point to a different file, you are going to allow the backing file to be opened just because it is listed in the image? If this is really the approach you are suggesting, it seems to me the whole 'do not allow random opens on NFS' security thing has gone out the window. To the best of my understanding, the whole idea with selinux attributes was to be able to specify which files are allowed to be opened by a given process. Mapping this to the libvirt model, it should mean libvirt ought to carry a positive list of files that are allowed to be opened, rather than relying on what might be written to an image file. Jes

On 07/21/2011 02:38 AM, Jes Sorensen wrote:
On 07/20/11 11:36, Daniel P. Berrange wrote:
On Wed, Jul 20, 2011 at 10:23:12AM +0200, Jes Sorensen wrote:
Pardon, but I fail to see the issue here. If QEMU passes a filename back to libvirt, libvirt still gets to make the decision whether or not it is legitimate for QEMU to get that file descriptor or not. It doesn't change anything wrt who actually opens the file, hence the 'trust' is unchanged.
To make the decision whether the filename from QEMU is valid, we have to parse the master image header data to see if the filename actually matches the backing file required by the image assigned to the guest.
Sorry but that doesn't make any sense. In other words, if someone hacks an image and makes it point to a different file, you are going to allow the backing file to be opened just because it is listed in the image?
Yes, because the only way someone could hack that image is if they have rights to that file in the first place. And if they have rights to that file in the first place, then they also can call qemu-img to modify that file, or any other number of modifications - but that's not an escalation in privilege, so it is not a security hole. From the security point of view, the problem we're solving here is not whether the host's file system is properly secured so that only the right people can hack image files in the first place - that problem has to already be solved before adding sVirt makes any sense. Rather, the problem we're talking about here is the sVirt security setup - where no guest can corrupt the data of any other guest, given that the host is already secure. As long as no one qemu process can open the qcow2 file of another guest, then it doesn't matter what other security flaws might be found in qemu - you have guaranteed that a single rogue guest-in-qemu situation cannot corrupt any other guests, because the rogue process cannot hack any images not belonging to the rogue guest.
To the best of my understanding, the whole idea with selinux attributes was to be able to specify which files are allowed to be opened by a given process. Mapping this to the libvirt model, it should mean libvirt ought to carry a positive list of files that are allowed to be opened,
Which it does, by reading the metadata of those files prior to the point of ever handing those files over to an untrusted process - except in one case: right now, libvirt is not validating that a qcow2 file is still in a sane state after a qemu process ends.
rather than relying on what might be written to an image file.
Thank you for persisting - you've found another hole that needs to be plugged. It sounds like you are proposing that after a qemu process dies, that libvirt re-reads the qcow2 metadata headers, and validates that the backing file information has not changed in a manner unexpected by libvirt. If it has, then the qemu process that just died was compromised to the point that restarting a new qemu process from the old image is now a security risk. So this is _yet another_ security aspect that needs to be coded into libvirt as part of hardening sVirt. But it is an independent issue of the need for fd passing. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On 07/21/11 16:02, Eric Blake wrote:
On 07/21/2011 02:38 AM, Jes Sorensen wrote:
On 07/20/11 11:36, Daniel P. Berrange wrote:
On Wed, Jul 20, 2011 at 10:23:12AM +0200, Jes Sorensen wrote:
Pardon, but I fail to see the issue here. If QEMU passes a filename back to libvirt, libvirt still gets to make the decision whether or not it is legitimate for QEMU to get that file descriptor or not. It doesn't change anything wrt who actually opens the file, hence the 'trust' is unchanged.
To make the decision whether the filename from QEMU is valid, we have to parse the master image header data to see if the filename actually matches the backing file required by the image assigned to the guest.
Sorry but that doesn't make any sense. In other words, if someone hacks an image and makes it point to a different file, you are going to allow the backing file to be opened just because it is listed in the image?
Yes, because the only way someone could hack that image is if they have rights to that file in the first place. And if they have rights to that file in the first place, then they also can call qemu-img to modify that file, or any other number of modifications - but that's not an escalation in privilege, so it is not a security hole.
Ehm, we're talking about the case where a guest is able to compromise the QEMU process controlling it. This is what libvirt / selinux is trying to prevent. Now if the guest is able to break out of it's shell, it can modify the disk image headers. Crash the QEMU process and wait for the guest to be rebooted by the admin..... Voila we now have access to random files on the NFS store we couldn't reach before. So back to the drawing board, we do have a security problem here....
To the best of my understanding, the whole idea with selinux attributes was to be able to specify which files are allowed to be opened by a given process. Mapping this to the libvirt model, it should mean libvirt ought to carry a positive list of files that are allowed to be opened,
Which it does, by reading the metadata of those files prior to the point of ever handing those files over to an untrusted process - except in one case: right now, libvirt is not validating that a qcow2 file is still in a sane state after a qemu process ends.
You have previously explained that anything written to the QEMU header is trusted by libvirt in the first place.
rather than relying on what might be written to an image file.
Thank you for persisting - you've found another hole that needs to be plugged. It sounds like you are proposing that after a qemu process dies, that libvirt re-reads the qcow2 metadata headers, and validates that the backing file information has not changed in a manner unexpected by libvirt. If it has, then the qemu process that just died was compromised to the point that restarting a new qemu process from the old image is now a security risk. So this is _yet another_ security aspect that needs to be coded into libvirt as part of hardening sVirt.
But it is an independent issue of the need for fd passing.
No it is not independent, we're back to the point where we started. The issue at hand is still that libvirt has no business messing about in metadata headers of guest images. libvirt has the 'right' to validate file names, but that is not justification in messing about in metadata. Allowing that is an endless cat and mouse hunt of keeping up, which is guaranteed to get out of sync. If this problem is to be fixed, lets fix it correctly. Jes

On Thu, Jul 21, 2011 at 3:02 PM, Eric Blake <eblake@redhat.com> wrote:
Thank you for persisting - you've found another hole that needs to be plugged. It sounds like you are proposing that after a qemu process dies, that libvirt re-reads the qcow2 metadata headers, and validates that the backing file information has not changed in a manner unexpected by libvirt. If it has, then the qemu process that just died was compromised to the point that restarting a new qemu process from the old image is now a security risk. So this is _yet another_ security aspect that needs to be coded into libvirt as part of hardening sVirt.
The backing file information changes when image streaming completes. Before: fedora.img <- my_vm.qed After: my_vm.qed (fedora.img is no longer referenced) The image streaming operation copies data out of fedora.img and populates my_vm.qed. When image streaming completes, the backing file is no longer needed and my_vm.qed is updated to drop the backing file. I think we need to design carefully to prevent QEMU and libvirt making incorrect assumptions about who does what. I really wish that all this image file business was outside QEMU and libvirt - that we had a separate storage management service which handled the details. QEMU would only do block device operations (no image format manipulation), and libvirt would only delegate to the storage management service. Today we seem to be sprinkling a little bit of storage management into QEMU and a little bit into libvirt :(. In that spirit it is much nicer to think of storage like a SAN appliance where you have LUNs that you access as block devices. It also provides an API for snapshotting, cloning LUNs, etc. Let's move to that model instead of worrying about how to spread storage logic across QEMU and libvirt. Stefan

On Thu, Jul 21, 2011 at 6:01 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
On Thu, Jul 21, 2011 at 3:02 PM, Eric Blake <eblake@redhat.com> wrote:
Thank you for persisting - you've found another hole that needs to be plugged. It sounds like you are proposing that after a qemu process dies, that libvirt re-reads the qcow2 metadata headers, and validates that the backing file information has not changed in a manner unexpected by libvirt. If it has, then the qemu process that just died was compromised to the point that restarting a new qemu process from the old image is now a security risk. So this is _yet another_ security aspect that needs to be coded into libvirt as part of hardening sVirt.
The backing file information changes when image streaming completes.
Before: fedora.img <- my_vm.qed After: my_vm.qed (fedora.img is no longer referenced)
The image streaming operation copies data out of fedora.img and populates my_vm.qed. When image streaming completes, the backing file is no longer needed and my_vm.qed is updated to drop the backing file.
I think we need to design carefully to prevent QEMU and libvirt making incorrect assumptions about who does what. I really wish that all this image file business was outside QEMU and libvirt - that we had a separate storage management service which handled the details. QEMU would only do block device operations (no image format manipulation), and libvirt would only delegate to the storage management service. Today we seem to be sprinkling a little bit of storage management into QEMU and a little bit into libvirt :(.
In that spirit it is much nicer to think of storage like a SAN appliance where you have LUNs that you access as block devices. It also provides an API for snapshotting, cloning LUNs, etc.
Let's move to that model instead of worrying about how to spread storage logic across QEMU and libvirt.
Would NBD protocol fit to this purpose, or is it too simple? Then libvirt would handle the storage format completely and present an NBD interface to QEMU (or give an fd to an external service) and QEMU would not care about the storage format in this mode at all.

On Thu, Jul 21, 2011 at 8:42 PM, Blue Swirl <blauwirbel@gmail.com> wrote:
On Thu, Jul 21, 2011 at 6:01 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
On Thu, Jul 21, 2011 at 3:02 PM, Eric Blake <eblake@redhat.com> wrote:
Thank you for persisting - you've found another hole that needs to be plugged. It sounds like you are proposing that after a qemu process dies, that libvirt re-reads the qcow2 metadata headers, and validates that the backing file information has not changed in a manner unexpected by libvirt. If it has, then the qemu process that just died was compromised to the point that restarting a new qemu process from the old image is now a security risk. So this is _yet another_ security aspect that needs to be coded into libvirt as part of hardening sVirt.
The backing file information changes when image streaming completes.
Before: fedora.img <- my_vm.qed After: my_vm.qed (fedora.img is no longer referenced)
The image streaming operation copies data out of fedora.img and populates my_vm.qed. When image streaming completes, the backing file is no longer needed and my_vm.qed is updated to drop the backing file.
I think we need to design carefully to prevent QEMU and libvirt making incorrect assumptions about who does what. I really wish that all this image file business was outside QEMU and libvirt - that we had a separate storage management service which handled the details. QEMU would only do block device operations (no image format manipulation), and libvirt would only delegate to the storage management service. Today we seem to be sprinkling a little bit of storage management into QEMU and a little bit into libvirt :(.
In that spirit it is much nicer to think of storage like a SAN appliance where you have LUNs that you access as block devices. It also provides an API for snapshotting, cloning LUNs, etc.
Let's move to that model instead of worrying about how to spread storage logic across QEMU and libvirt.
Would NBD protocol fit to this purpose, or is it too simple? Then libvirt would handle the storage format completely and present an NBD interface to QEMU (or give an fd to an external service) and QEMU would not care about the storage format in this mode at all.
NBD does not support flush (fdatasync). Therefore it only supports the slow cache=writethrough mode in a safe manner. It would be neat to use virtio-blk as the interface because it can be passed through to the guest. The guest talks directly to the storage management service without going through QEMU. The trick is to do something like vhost: 1. An ioeventfd for virtqueue (guest->host) kicks 2. An irqfd for host->guest kicks 3. Shared memory for vring and zero-copy data access The storage management service provides a UNIX domain socket over which fds can be passed to set up the vhost-like virtio-blk interface. Moving the image format code into a separate program makes it possible to safely write to a backing file while VMs are using it because the storage service can be host-wide, not per-VM. For example, streaming a shared backing file over NFS while running VMs using copy-on-write images. If we ever want to do deduplication or other global operations, then this approach is nice too. To summarize: The storage service manages image files including creation, deletion, snapshotting, and actual I/O. QEMU uses a vhost-like virtio-blk interface and can pass it directly into the guest. libvirt uses the storage service API without needing to parse image files or keep track of backing file relationships. Stefan

On Fri, Jul 22, 2011 at 8:06 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
On Thu, Jul 21, 2011 at 8:42 PM, Blue Swirl <blauwirbel@gmail.com> wrote:
On Thu, Jul 21, 2011 at 6:01 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
On Thu, Jul 21, 2011 at 3:02 PM, Eric Blake <eblake@redhat.com> wrote:
Thank you for persisting - you've found another hole that needs to be plugged. It sounds like you are proposing that after a qemu process dies, that libvirt re-reads the qcow2 metadata headers, and validates that the backing file information has not changed in a manner unexpected by libvirt. If it has, then the qemu process that just died was compromised to the point that restarting a new qemu process from the old image is now a security risk. So this is _yet another_ security aspect that needs to be coded into libvirt as part of hardening sVirt.
The backing file information changes when image streaming completes.
Before: fedora.img <- my_vm.qed After: my_vm.qed (fedora.img is no longer referenced)
The image streaming operation copies data out of fedora.img and populates my_vm.qed. When image streaming completes, the backing file is no longer needed and my_vm.qed is updated to drop the backing file.
I think we need to design carefully to prevent QEMU and libvirt making incorrect assumptions about who does what. I really wish that all this image file business was outside QEMU and libvirt - that we had a separate storage management service which handled the details. QEMU would only do block device operations (no image format manipulation), and libvirt would only delegate to the storage management service. Today we seem to be sprinkling a little bit of storage management into QEMU and a little bit into libvirt :(.
In that spirit it is much nicer to think of storage like a SAN appliance where you have LUNs that you access as block devices. It also provides an API for snapshotting, cloning LUNs, etc.
Let's move to that model instead of worrying about how to spread storage logic across QEMU and libvirt.
Would NBD protocol fit to this purpose, or is it too simple? Then libvirt would handle the storage format completely and present an NBD interface to QEMU (or give an fd to an external service) and QEMU would not care about the storage format in this mode at all.
NBD does not support flush (fdatasync). Therefore it only supports the slow cache=writethrough mode in a safe manner.
Maybe NBD could still be used in networked setups as a secondary alternative.
It would be neat to use virtio-blk as the interface because it can be passed through to the guest. The guest talks directly to the storage management service without going through QEMU. The trick is to do something like vhost: 1. An ioeventfd for virtqueue (guest->host) kicks 2. An irqfd for host->guest kicks 3. Shared memory for vring and zero-copy data access
The storage management service provides a UNIX domain socket over which fds can be passed to set up the vhost-like virtio-blk interface.
Moving the image format code into a separate program makes it possible to safely write to a backing file while VMs are using it because the storage service can be host-wide, not per-VM. For example, streaming a shared backing file over NFS while running VMs using copy-on-write images. If we ever want to do deduplication or other global operations, then this approach is nice too.
To summarize: The storage service manages image files including creation, deletion, snapshotting, and actual I/O. QEMU uses a vhost-like virtio-blk interface and can pass it directly into the guest. libvirt uses the storage service API without needing to parse image files or keep track of backing file relationships.
Excellent plan. If one day kernel provides builtin virtio-blk services which can be passed via libvirt and QEMU to the guest, we'll even have zero copy all the way.

Am 21.07.2011 17:01, schrieb Stefan Hajnoczi:
On Thu, Jul 21, 2011 at 3:02 PM, Eric Blake <eblake@redhat.com> wrote:
Thank you for persisting - you've found another hole that needs to be plugged. It sounds like you are proposing that after a qemu process dies, that libvirt re-reads the qcow2 metadata headers, and validates that the backing file information has not changed in a manner unexpected by libvirt. If it has, then the qemu process that just died was compromised to the point that restarting a new qemu process from the old image is now a security risk. So this is _yet another_ security aspect that needs to be coded into libvirt as part of hardening sVirt.
The backing file information changes when image streaming completes.
Before: fedora.img <- my_vm.qed After: my_vm.qed (fedora.img is no longer referenced)
The image streaming operation copies data out of fedora.img and populates my_vm.qed. When image streaming completes, the backing file is no longer needed and my_vm.qed is updated to drop the backing file.
I think we need to design carefully to prevent QEMU and libvirt making incorrect assumptions about who does what. I really wish that all this image file business was outside QEMU and libvirt - that we had a separate storage management service which handled the details. QEMU would only do block device operations (no image format manipulation), and libvirt would only delegate to the storage management service.
And how do you implement that in a way that works on all platforms, and without root privileges? I can't see this happen unless it stays completely optional. Kevin

On Fri, Jul 22, 2011 at 8:22 AM, Kevin Wolf <kwolf@redhat.com> wrote:
Am 21.07.2011 17:01, schrieb Stefan Hajnoczi:
On Thu, Jul 21, 2011 at 3:02 PM, Eric Blake <eblake@redhat.com> wrote:
Thank you for persisting - you've found another hole that needs to be plugged. It sounds like you are proposing that after a qemu process dies, that libvirt re-reads the qcow2 metadata headers, and validates that the backing file information has not changed in a manner unexpected by libvirt. If it has, then the qemu process that just died was compromised to the point that restarting a new qemu process from the old image is now a security risk. So this is _yet another_ security aspect that needs to be coded into libvirt as part of hardening sVirt.
The backing file information changes when image streaming completes.
Before: fedora.img <- my_vm.qed After: my_vm.qed (fedora.img is no longer referenced)
The image streaming operation copies data out of fedora.img and populates my_vm.qed. When image streaming completes, the backing file is no longer needed and my_vm.qed is updated to drop the backing file.
I think we need to design carefully to prevent QEMU and libvirt making incorrect assumptions about who does what. I really wish that all this image file business was outside QEMU and libvirt - that we had a separate storage management service which handled the details. QEMU would only do block device operations (no image format manipulation), and libvirt would only delegate to the storage management service.
And how do you implement that in a way that works on all platforms, and without root privileges? I can't see this happen unless it stays completely optional.
The cross-platform way would be an iSCSI target that understands image formats. But iSCSI requires copying when doing I/O and we can't pass through virtio-blk. I'm not sure I see the root privilege issue. Stefan

On Fri, Jul 22, 2011 at 12:11 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
On Fri, Jul 22, 2011 at 8:22 AM, Kevin Wolf <kwolf@redhat.com> wrote:
Am 21.07.2011 17:01, schrieb Stefan Hajnoczi:
On Thu, Jul 21, 2011 at 3:02 PM, Eric Blake <eblake@redhat.com> wrote:
Thank you for persisting - you've found another hole that needs to be plugged. It sounds like you are proposing that after a qemu process dies, that libvirt re-reads the qcow2 metadata headers, and validates that the backing file information has not changed in a manner unexpected by libvirt. If it has, then the qemu process that just died was compromised to the point that restarting a new qemu process from the old image is now a security risk. So this is _yet another_ security aspect that needs to be coded into libvirt as part of hardening sVirt.
The backing file information changes when image streaming completes.
Before: fedora.img <- my_vm.qed After: my_vm.qed (fedora.img is no longer referenced)
The image streaming operation copies data out of fedora.img and populates my_vm.qed. When image streaming completes, the backing file is no longer needed and my_vm.qed is updated to drop the backing file.
I think we need to design carefully to prevent QEMU and libvirt making incorrect assumptions about who does what. I really wish that all this image file business was outside QEMU and libvirt - that we had a separate storage management service which handled the details. QEMU would only do block device operations (no image format manipulation), and libvirt would only delegate to the storage management service.
And how do you implement that in a way that works on all platforms, and without root privileges? I can't see this happen unless it stays completely optional.
The cross-platform way would be an iSCSI target that understands image formats. But iSCSI requires copying when doing I/O and we can't pass through virtio-blk.
The guest could use iSCSI directly using the network interface without virtio-blk. This setup wouldn't give max performance in local use but it could also be useful in some networked setups and probably more useful than NBD.

Am 19.07.2011 18:46, schrieb Daniel P. Berrange:
On Tue, Jul 19, 2011 at 04:14:27PM +0100, Stefan Hajnoczi wrote:
On Tue, Jul 19, 2011 at 3:30 PM, Jes Sorensen <Jes.Sorensen@redhat.com> wrote:
On 07/19/11 16:24, Eric Blake wrote:
[adding the libvir-list] On 07/19/2011 08:09 AM, Jes Sorensen wrote:
Urgh, libvirt parsing image files is really unfortunate, it really doesn't give me warm fuzzy feelings :( libvirt really should not know about internals of image formats.
But even if you add new features to qemu to avoid needing this in the future, it doesn't change the past - libvirt will always have to know how to parse image files understood by older qemu, and so as long as libvirt already knows how to do that parsing, we might as well take advantage of it.
What has been done here in the past is plain wrong. Continuing to do it isn't the right thing to do here.
Besides, I feel that having a well-documented file format, so that independent applications can both parse the same file with the same semantics by obeying the file format specification, is a good design goal.
We all know that documentation is rarely uptodate, new features may not get added and libvirt will never be able to keep up. The driver for a file format belongs in QEMU and nowhere else.
It should be a goal to avoid dependencies in multiple layers of the stack because it becomes are to add new features - they require coordinated changes in multiple layers. Having both QEMU and libvirt know the internals of image files is such a multi-dependency. If I want to add a new format or change an existing format I have to touch both layers.
For fd-passing perhaps we have an opportunity to use a callback mechanism (QEMU request: filename -> libvirt response: fd) and do all the image format parsing in QEMU.
The reason why libvirt does the parsing of file headers to determine backing files is to maintain the trust boundary. Everything run from the exec() of QEMU onwards is considered untrusted code. So having QEMU parsing the file headers & passing back open() requests to libvirt is breaking the trust boundary.
NB, i'm not happy about libvirt having to have knowledge of file format headers, but we needed something more efficient & reliable than invoking qemu-img info & parsing the output.
What's the real problem with this approach? Parsing the data meant for humans, from an interface that is potentially unstable? If this is it, it should be easy enough to add a JSON output mode to qemu-img info.
Ideally QEMU (or something else) would provide a library libblockformat.so with stable APIs for at least reading metadata about image formats. If it had APIs for image creation, etc too that would be a bonus, but we're more or less ok spawning qemu-img for those cases currently.
I'm afraid the block drivers have too many dependencies on the qemu core for this to be an option without investing a lot of effort. Kevin

On Wed, Jul 20, 2011 at 11:50:53AM +0200, Kevin Wolf wrote:
Am 19.07.2011 18:46, schrieb Daniel P. Berrange:
On Tue, Jul 19, 2011 at 04:14:27PM +0100, Stefan Hajnoczi wrote:
On Tue, Jul 19, 2011 at 3:30 PM, Jes Sorensen <Jes.Sorensen@redhat.com> wrote:
On 07/19/11 16:24, Eric Blake wrote:
[adding the libvir-list] On 07/19/2011 08:09 AM, Jes Sorensen wrote:
Urgh, libvirt parsing image files is really unfortunate, it really doesn't give me warm fuzzy feelings :( libvirt really should not know about internals of image formats.
But even if you add new features to qemu to avoid needing this in the future, it doesn't change the past - libvirt will always have to know how to parse image files understood by older qemu, and so as long as libvirt already knows how to do that parsing, we might as well take advantage of it.
What has been done here in the past is plain wrong. Continuing to do it isn't the right thing to do here.
Besides, I feel that having a well-documented file format, so that independent applications can both parse the same file with the same semantics by obeying the file format specification, is a good design goal.
We all know that documentation is rarely uptodate, new features may not get added and libvirt will never be able to keep up. The driver for a file format belongs in QEMU and nowhere else.
It should be a goal to avoid dependencies in multiple layers of the stack because it becomes are to add new features - they require coordinated changes in multiple layers. Having both QEMU and libvirt know the internals of image files is such a multi-dependency. If I want to add a new format or change an existing format I have to touch both layers.
For fd-passing perhaps we have an opportunity to use a callback mechanism (QEMU request: filename -> libvirt response: fd) and do all the image format parsing in QEMU.
The reason why libvirt does the parsing of file headers to determine backing files is to maintain the trust boundary. Everything run from the exec() of QEMU onwards is considered untrusted code. So having QEMU parsing the file headers & passing back open() requests to libvirt is breaking the trust boundary.
NB, i'm not happy about libvirt having to have knowledge of file format headers, but we needed something more efficient & reliable than invoking qemu-img info & parsing the output.
What's the real problem with this approach? Parsing the data meant for humans, from an interface that is potentially unstable? If this is it, it should be easy enough to add a JSON output mode to qemu-img info.
It is a really heavyweight solution to have to spawn qemu-img every time we need to access this data, when it can be done with a trivial open+read+close sequence. In addition the output data format is not entirely pleasant for machine reading (some fields only have data rounded up to MB, not the raw byte count). Finally, we also wanted to be able to extract some basic metdata about disk image formats on non-QEMU hosts, for our storage management APIs which are used on Xen or VMWare hosts where many of these same disk image formats are also used. A JSON output mode would be helpful, but unfortunately can't really address the other issues.
Ideally QEMU (or something else) would provide a library libblockformat.so with stable APIs for at least reading metadata about image formats. If it had APIs for image creation, etc too that would be a bonus, but we're more or less ok spawning qemu-img for those cases currently.
I'm afraid the block drivers have too many dependencies on the qemu core for this to be an option without investing a lot of effort.
That's why I sort of think there is value in having someone provide a standalone library API for querying some core set of block format metadata. QEMU is but one project with virtual disk formats, there are plenty of others out there in existance, so while reusing QEMU block code would be nice, it isn't leading to any significant reduction in copies of block format parsing code amongst all the virt projects in existance. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 07/19/2011 09:30 AM, Jes Sorensen wrote:
On 07/19/11 16:24, Eric Blake wrote:
[adding the libvir-list] On 07/19/2011 08:09 AM, Jes Sorensen wrote:
Urgh, libvirt parsing image files is really unfortunate, it really doesn't give me warm fuzzy feelings :( libvirt really should not know about internals of image formats.
But even if you add new features to qemu to avoid needing this in the future, it doesn't change the past - libvirt will always have to know how to parse image files understood by older qemu, and so as long as libvirt already knows how to do that parsing, we might as well take advantage of it.
What has been done here in the past is plain wrong. Continuing to do it isn't the right thing to do here.
Besides, I feel that having a well-documented file format, so that independent applications can both parse the same file with the same semantics by obeying the file format specification, is a good design goal.
We all know that documentation is rarely uptodate, new features may not get added and libvirt will never be able to keep up. The driver for a file format belongs in QEMU and nowhere else.
It would be nice if libvirt had a way to pass fds for every disk and backing file up front; then, SELinux can work around the lack of NFS per-file labelling by blocking open() in qemu. In fact, this has already been proposed:
A cleaner solution seems to have libvirt provide a call-back allowing QEMU to call out and have libvirt open a file descriptor instead. This way libvirt can validate it and open it for QEMU and pass it back.
Yes, that could probably be made to work with libvirt.
I am a little frustrated this approach wasn't taken up front instead of the evil hack of having libvirt attempt to parse image files.
If we cannot do something like this, I would prefer to have backing files on NFS should simply not be supported when running in an selinux setup.
As nice as that sentiment is, it will never fly, because it would be a regression in current behavior. The whole reason that the virt_use_nfs SELinux bool exists is that some people are willing to make the partial security tradeoff. Besides, the use of sVirt via SELinux is more than just open() protection - while the current virt_use_nfs bool makes NFS less secure than otherwise possible, it still gives some nice guarantees to the rest of the qemu process such as passthrough accesses to local pci devices.
Well leaving things at status quo is not making it worse, it just leaves an evil in place.
NFS and SELinux is a fundamental problem with SELinux and NFS. We can piss and moan as much as we want about it but it's reality. SELinux fundamentally requires extended attributes. By the time NFS adds extended attribute support, we'll all be flying around in hover cars. As terrible as NFS is, people use it all of the time. It would be nice if libvirt had the ability to make better use of DAC to support isolation. The fact that MAC is the only way you can do isolation between guests is pretty unfortunate. If I could assign specific UIDs to a guest and use that to enforce isolation, it would go a long ways to solving this problem. Regards, Anthony Liguori

On 07/19/11 18:14, Anthony Liguori wrote:
As nice as that sentiment is, it will never fly, because it would be a regression in current behavior. The whole reason that the virt_use_nfs SELinux bool exists is that some people are willing to make the partial security tradeoff. Besides, the use of sVirt via SELinux is more than just open() protection - while the current virt_use_nfs bool makes NFS less secure than otherwise possible, it still gives some nice guarantees to the rest of the qemu process such as passthrough accesses to local pci devices.
Well leaving things at status quo is not making it worse, it just leaves an evil in place.
NFS and SELinux is a fundamental problem with SELinux and NFS. We can piss and moan as much as we want about it but it's reality. SELinux fundamentally requires extended attributes. By the time NFS adds extended attribute support, we'll all be flying around in hover cars.
As terrible as NFS is, people use it all of the time.
It would be nice if libvirt had the ability to make better use of DAC to support isolation. The fact that MAC is the only way you can do isolation between guests is pretty unfortunate. If I could assign specific UIDs to a guest and use that to enforce isolation, it would go a long ways to solving this problem.
Right, we're stuck with the two horros of NFS and selinux, so we need something that gets around the problem. In a sane world we would simply say 'no NFS, no selinux', but as you say that will never happen. My suggestion of a callback mechanism where libvirt registers the callback with QEMU for open() calls, allowing libvirt to perform the open and return the open file descriptor would get around this problem. Jes

Am 20.07.2011 10:25, schrieb Jes Sorensen:
On 07/19/11 18:14, Anthony Liguori wrote:
As nice as that sentiment is, it will never fly, because it would be a regression in current behavior. The whole reason that the virt_use_nfs SELinux bool exists is that some people are willing to make the partial security tradeoff. Besides, the use of sVirt via SELinux is more than just open() protection - while the current virt_use_nfs bool makes NFS less secure than otherwise possible, it still gives some nice guarantees to the rest of the qemu process such as passthrough accesses to local pci devices.
Well leaving things at status quo is not making it worse, it just leaves an evil in place.
NFS and SELinux is a fundamental problem with SELinux and NFS. We can piss and moan as much as we want about it but it's reality. SELinux fundamentally requires extended attributes. By the time NFS adds extended attribute support, we'll all be flying around in hover cars.
As terrible as NFS is, people use it all of the time.
It would be nice if libvirt had the ability to make better use of DAC to support isolation. The fact that MAC is the only way you can do isolation between guests is pretty unfortunate. If I could assign specific UIDs to a guest and use that to enforce isolation, it would go a long ways to solving this problem.
Right, we're stuck with the two horros of NFS and selinux, so we need something that gets around the problem. In a sane world we would simply say 'no NFS, no selinux', but as you say that will never happen.
My suggestion of a callback mechanism where libvirt registers the callback with QEMU for open() calls, allowing libvirt to perform the open and return the open file descriptor would get around this problem.
To me this sounds more like a problem than a solution. It basically means that during an open (which may even be initiated by a monitor command), you need monitor interaction. It basically means that open becomes asynchronous, and requires clients to deal with that, which sounds at least "interesting"... Also you have to add some magic to all places opening something. I think if libvirt wants qemu to use an fd instead of a file name, it shouldn't pass a file name but an fd in the first place. Which means that the two that we need are support for an fd: protocol (patches on the list, need review), and a way for libvirt to override the backing file of an image. Kevin

On 07/20/11 12:01, Kevin Wolf wrote:
Right, we're stuck with the two horros of NFS and selinux, so we need something that gets around the problem. In a sane world we would simply say 'no NFS, no selinux', but as you say that will never happen.
My suggestion of a callback mechanism where libvirt registers the callback with QEMU for open() calls, allowing libvirt to perform the open and return the open file descriptor would get around this problem. To me this sounds more like a problem than a solution. It basically means that during an open (which may even be initiated by a monitor command), you need monitor interaction. It basically means that open becomes asynchronous, and requires clients to deal with that, which sounds at least "interesting"... Also you have to add some magic to all places opening something.
I think if libvirt wants qemu to use an fd instead of a file name, it shouldn't pass a file name but an fd in the first place. Which means that the two that we need are support for an fd: protocol (patches on the list, need review), and a way for libvirt to override the backing file of an image.
The problem is that QEMU will find backing file file names inside the images which it will be unable to open. How do you suggest we get around that? Jes

On 07/20/2011 07:25 AM, Jes Sorensen wrote:
I think if libvirt wants qemu to use an fd instead of a file name, it shouldn't pass a file name but an fd in the first place. Which means that the two that we need are support for an fd: protocol (patches on the list, need review), and a way for libvirt to override the backing file of an image.
The problem is that QEMU will find backing file file names inside the images which it will be unable to open. How do you suggest we get around that?
We've already told you - qemu must have a way to be passed fds which are associated with names, and when a file refers to another backing file by name, then qemu falls back on its fd/name mapping to use the already-passed fd instead. Which implies that someone else, either libvirt or a qemu-maintained libblockformat.so, needs to have a stable interface for parsing the backing file name out of an arbitrary qcow2 file, and that this interface must work no matter how many other extensions are added to qcow2. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On Wed, Jul 20, 2011 at 4:46 PM, Eric Blake <eblake@redhat.com> wrote:
On 07/20/2011 07:25 AM, Jes Sorensen wrote:
I think if libvirt wants qemu to use an fd instead of a file name, it shouldn't pass a file name but an fd in the first place. Which means that the two that we need are support for an fd: protocol (patches on the list, need review), and a way for libvirt to override the backing file of an image.
The problem is that QEMU will find backing file file names inside the images which it will be unable to open. How do you suggest we get around that?
We've already told you - qemu must have a way to be passed fds which are associated with names, and when a file refers to another backing file by name, then qemu falls back on its fd/name mapping to use the already-passed fd instead. Which implies that someone else, either libvirt or a qemu-maintained libblockformat.so, needs to have a stable interface for parsing the backing file name out of an arbitrary qcow2 file, and that this interface must work no matter how many other extensions are added to qcow2.
I'd avoid any name based access in this case. If QEMU has write access to main file, it could forge the backing file name in main file to point to for example /etc/shadow and then request libvirt to perform the opening.

On 07/20/2011 11:27 AM, Blue Swirl wrote:
We've already told you - qemu must have a way to be passed fds which are associated with names, and when a file refers to another backing file by name, then qemu falls back on its fd/name mapping to use the already-passed fd instead. Which implies that someone else, either libvirt or a qemu-maintained libblockformat.so, needs to have a stable interface for parsing the backing file name out of an arbitrary qcow2 file, and that this interface must work no matter how many other extensions are added to qcow2.
I'd avoid any name based access in this case. If QEMU has write access to main file, it could forge the backing file name in main file to point to for example /etc/shadow and then request libvirt to perform the opening.
Won't work. Well, it might work within the context of a single qemu process. But when that process ends, then libvirt would have to touch up the qcow2 headers of that file to replace the /etc/shadow name with the real backing file name, otherwise, the next time you restart qemu-img or a new qemu guest using the same image, the information has been lost, since the fd has been closed in the meantime. We really _do_ need a way to give qemu both an fd and the name of the file that the fd is tied to. On Linux, qemu could use /proc/self/fd to reconstruct the name from fd, but that's not portable to other OS. And we've already discussed how in the libvirt model, that libvirt is deemed more secure than qemu. Therefore, I think it is reasonable for qemu to make the assumptions that if it exposes a monitor command where the supervisor (libvirt or otherwise) can pass in both an fd and a file name, that either the supervisor is passing in correct information, or that the bug is in the supervisor and not in qemu if the supervisor passes in wrong information and things blow up. And the snapshot_blkdev monitor command is a case where qemu needs to create a new qcow2 image on the fly, while referencing the name of an existing file. What backing name do you put in the new qcow2 file unless you already have a name association for all fds already open for the existing backing file? -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On Wed, Jul 20, 2011 at 8:47 PM, Eric Blake <eblake@redhat.com> wrote:
On 07/20/2011 11:27 AM, Blue Swirl wrote:
We've already told you - qemu must have a way to be passed fds which are associated with names, and when a file refers to another backing file by name, then qemu falls back on its fd/name mapping to use the already-passed fd instead. Which implies that someone else, either libvirt or a qemu-maintained libblockformat.so, needs to have a stable interface for parsing the backing file name out of an arbitrary qcow2 file, and that this interface must work no matter how many other extensions are added to qcow2.
I'd avoid any name based access in this case. If QEMU has write access to main file, it could forge the backing file name in main file to point to for example /etc/shadow and then request libvirt to perform the opening.
Won't work. Well, it might work within the context of a single qemu process. But when that process ends, then libvirt would have to touch up the qcow2 headers of that file to replace the /etc/shadow name with the real backing file name, otherwise, the next time you restart qemu-img or a new qemu guest using the same image, the information has been lost, since the fd has been closed in the meantime.
How would libvirt know to do this touch up?
We really _do_ need a way to give qemu both an fd and the name of the file that the fd is tied to. On Linux, qemu could use /proc/self/fd to reconstruct the name from fd, but that's not portable to other OS. And we've already discussed how in the libvirt model, that libvirt is deemed more secure than qemu. Therefore, I think it is reasonable for qemu to make the assumptions that if it exposes a monitor command where the supervisor (libvirt or otherwise) can pass in both an fd and a file name, that either the supervisor is passing in correct information, or that the bug is in the supervisor and not in qemu if the supervisor passes in wrong information and things blow up.
Yes, I'm not worried about QEMU getting confused by libvirt.
And the snapshot_blkdev monitor command is a case where qemu needs to create a new qcow2 image on the fly, while referencing the name of an existing file. What backing name do you put in the new qcow2 file unless you already have a name association for all fds already open for the existing backing file?
For backing file with original name of "/path/in/storage", QEMU could present the fd and the backin path by requesting something like "fd:12,/path/in/storage". The next file in chain "/path2/file" would be "fd:12,fd:34,/path2/file". Or if possible, -fd 12 -backing /path/in/storage with spaces and funny characters escaped etc.

On 07/20/11 21:51, Blue Swirl wrote:
a new qcow2 image on the fly, while referencing the name of an existing file. What backing name do you put in the new qcow2 file unless you already have a name association for all fds already open for the existing backing file? For backing file with original name of "/path/in/storage", QEMU could
And the snapshot_blkdev monitor command is a case where qemu needs to create present the fd and the backin path by requesting something like "fd:12,/path/in/storage". The next file in chain "/path2/file" would be "fd:12,fd:34,/path2/file". Or if possible, -fd 12 -backing /path/in/storage with spaces and funny characters escaped etc.
Rather than trying to do this by mangling files on the disk, I would suggest we allow registering a call-back open function, which calls back into libvirt and requests it to open a given file. It can then do all it's security foo to decide whether or not to allow the file to be open. This is relatively clean and avoids the mess of relying on outside processes messing about in the images. Cheers, Jes

Am 21.07.2011 10:07, schrieb Jes Sorensen:
On 07/20/11 21:51, Blue Swirl wrote:
a new qcow2 image on the fly, while referencing the name of an existing file. What backing name do you put in the new qcow2 file unless you already have a name association for all fds already open for the existing backing file? For backing file with original name of "/path/in/storage", QEMU could
And the snapshot_blkdev monitor command is a case where qemu needs to create present the fd and the backin path by requesting something like "fd:12,/path/in/storage". The next file in chain "/path2/file" would be "fd:12,fd:34,/path2/file". Or if possible, -fd 12 -backing /path/in/storage with spaces and funny characters escaped etc.
Rather than trying to do this by mangling files on the disk, I would suggest we allow registering a call-back open function, which calls back into libvirt and requests it to open a given file. It can then do all it's security foo to decide whether or not to allow the file to be open.
This is relatively clean and avoids the mess of relying on outside processes messing about in the images.
You forget that libvirt parses images exactly to decide whether to allow accesses or not, so they would still do it. Which shouldn't be a big problem anyway as long as the libvirt implementation is compliant to the image format specification (even though it's not nice, of course). I just wonder why libvirt doesn't trust qemu enough that it can open() what it wants, but at the same time relies for this check on information in image files which this very same qemu can modify. Kevin

On 07/21/11 10:36, Kevin Wolf wrote:
Am 21.07.2011 10:07, schrieb Jes Sorensen:
Rather than trying to do this by mangling files on the disk, I would suggest we allow registering a call-back open function, which calls back into libvirt and requests it to open a given file. It can then do all it's security foo to decide whether or not to allow the file to be open.
This is relatively clean and avoids the mess of relying on outside processes messing about in the images.
You forget that libvirt parses images exactly to decide whether to allow accesses or not, so they would still do it. Which shouldn't be a big problem anyway as long as the libvirt implementation is compliant to the image format specification (even though it's not nice, of course).
As I just responded to Daniel, this doesn't make sense. If libvirt wants to guard against a compromised QEMU, it cannot rely on things written into image files headers by QEMU. If we trust contents of image files, there is no reason not to grant QEMU full access to files on NFS either, as we have effectively already granted that by trusting the image file content. In other words, if the goal is to make this secure, lets make it secure, otherwise, lets stop pretending.
I just wonder why libvirt doesn't trust qemu enough that it can open() what it wants, but at the same time relies for this check on information in image files which this very same qemu can modify.
Probably for the same reason few QEMU developers trust libvirt.... Jes

On Thu, Jul 21, 2011 at 11:07 AM, Jes Sorensen <Jes.Sorensen@redhat.com> wrote:
On 07/20/11 21:51, Blue Swirl wrote:
a new qcow2 image on the fly, while referencing the name of an existing file. What backing name do you put in the new qcow2 file unless you already have a name association for all fds already open for the existing backing file? For backing file with original name of "/path/in/storage", QEMU could
And the snapshot_blkdev monitor command is a case where qemu needs to create present the fd and the backin path by requesting something like "fd:12,/path/in/storage". The next file in chain "/path2/file" would be "fd:12,fd:34,/path2/file". Or if possible, -fd 12 -backing /path/in/storage with spaces and funny characters escaped etc.
Rather than trying to do this by mangling files on the disk, I would suggest we allow registering a call-back open function, which calls back into libvirt and requests it to open a given file. It can then do all it's security foo to decide whether or not to allow the file to be open.
Just to clarify: I was not proposing any mangling of the files.
This is relatively clean and avoids the mess of relying on outside processes messing about in the images.
Cheers, Jes

On 07/20/11 19:47, Eric Blake wrote:
On 07/20/2011 11:27 AM, Blue Swirl wrote:
I'd avoid any name based access in this case. If QEMU has write access to main file, it could forge the backing file name in main file to point to for example /etc/shadow and then request libvirt to perform the opening.
Won't work. Well, it might work within the context of a single qemu process. But when that process ends, then libvirt would have to touch up the qcow2 headers of that file to replace the /etc/shadow name with the real backing file name, otherwise, the next time you restart qemu-img or a new qemu guest using the same image, the information has been lost, since the fd has been closed in the meantime.
We really _do_ need a way to give qemu both an fd and the name of the file that the fd is tied to. On Linux, qemu could use /proc/self/fd to reconstruct the name from fd, but that's not portable to other OS. And we've already discussed how in the libvirt model, that libvirt is deemed more secure than qemu.
I am sorry, but this is where your assertion fails completely. QEMU cannot trust libvirt being able to parse it's image files correctly, so this is a total no-go.
Therefore, I think it is reasonable for qemu to make the assumptions that if it exposes a monitor command where the supervisor (libvirt or otherwise) can pass in both an fd and a file name, that either the supervisor is passing in correct information, or that the bug is in the supervisor and not in qemu if the supervisor passes in wrong information and things blow up.
Sorry but this is not a reasonable situation given how you want to abuse it.
And the snapshot_blkdev monitor command is a case where qemu needs to create a new qcow2 image on the fly, while referencing the name of an existing file. What backing name do you put in the new qcow2 file unless you already have a name association for all fds already open for the existing backing file?
It is more than that, the problem is also there when you try to open an image that has a backing file in an selinux environment where qemu isn't allowed to open the backing file. This problem needs a proper solution which does not involve libvirt messing about in binary files where it has no business of being. Jes

Am 20.07.2011 19:47, schrieb Eric Blake:
We really _do_ need a way to give qemu both an fd and the name of the file that the fd is tied to. On Linux, qemu could use /proc/self/fd to reconstruct the name from fd, but that's not portable to other OS.
Is there any reason for qemu to know the file name if libvirt wants it to use only pre-opened fds anyway? I don't think it should even know the file name.
And we've already discussed how in the libvirt model, that libvirt is deemed more secure than qemu. Therefore, I think it is reasonable for qemu to make the assumptions that if it exposes a monitor command where the supervisor (libvirt or otherwise) can pass in both an fd and a file name, that either the supervisor is passing in correct information, or that the bug is in the supervisor and not in qemu if the supervisor passes in wrong information and things blow up.
And the snapshot_blkdev monitor command is a case where qemu needs to create a new qcow2 image on the fly, while referencing the name of an existing file. What backing name do you put in the new qcow2 file unless you already have a name association for all fds already open for the existing backing file?
Without adding anything, it would put in fd:42. Absolutely useless, but it's a managed VM anyway and libvirt is overriding the backing file with a new fd each time it runs the VM, so it doesn't really matter. It starts to matter if you use the image outside libvirt, then it would be nice to have the right information there. That may be a point for adding the "real" file name to snapshot_blkdev. Kevin

On 07/21/11 10:18, Kevin Wolf wrote:
Am 20.07.2011 19:47, schrieb Eric Blake:
We really _do_ need a way to give qemu both an fd and the name of the file that the fd is tied to. On Linux, qemu could use /proc/self/fd to reconstruct the name from fd, but that's not portable to other OS. Is there any reason for qemu to know the file name if libvirt wants it to use only pre-opened fds anyway? I don't think it should even know the file name.
In this case we shouldn't write backing file filenames into image files, but some sort of libvirt token...... then again that will be a lot of fun if you want to run things on the command line afterwards. Jes

On 07/20/11 15:46, Eric Blake wrote:
On 07/20/2011 07:25 AM, Jes Sorensen wrote:
I think if libvirt wants qemu to use an fd instead of a file name, it shouldn't pass a file name but an fd in the first place. Which means that the two that we need are support for an fd: protocol (patches on the list, need review), and a way for libvirt to override the backing file of an image.
The problem is that QEMU will find backing file file names inside the images which it will be unable to open. How do you suggest we get around that?
We've already told you - qemu must have a way to be passed fds which are associated with names, and when a file refers to another backing file by name, then qemu falls back on its fd/name mapping to use the already-passed fd instead. Which implies that someone else, either libvirt or a qemu-maintained libblockformat.so, needs to have a stable interface for parsing the backing file name out of an arbitrary qcow2 file, and that this interface must work no matter how many other extensions are added to qcow2.
As I replied to you earlier, libvirt is *not* allowed to be messing with internals of image files! Passing in backing file fds as a result of libvirt messing around in internals of the image headers is utterly broken and is not going to happen! Jes

Am 20.07.2011 15:25, schrieb Jes Sorensen:
On 07/20/11 12:01, Kevin Wolf wrote:
Right, we're stuck with the two horros of NFS and selinux, so we need something that gets around the problem. In a sane world we would simply say 'no NFS, no selinux', but as you say that will never happen.
My suggestion of a callback mechanism where libvirt registers the callback with QEMU for open() calls, allowing libvirt to perform the open and return the open file descriptor would get around this problem. To me this sounds more like a problem than a solution. It basically means that during an open (which may even be initiated by a monitor command), you need monitor interaction. It basically means that open becomes asynchronous, and requires clients to deal with that, which sounds at least "interesting"... Also you have to add some magic to all places opening something.
I think if libvirt wants qemu to use an fd instead of a file name, it shouldn't pass a file name but an fd in the first place. Which means that the two that we need are support for an fd: protocol (patches on the list, need review), and a way for libvirt to override the backing file of an image.
The problem is that QEMU will find backing file file names inside the images which it will be unable to open. How do you suggest we get around that?
This is the part with allowing libvirt to override the backing file. Of course, this is not something that we can add with five lines of code, it requires -blockdev. Kevin

On Wed, Jul 20, 2011 at 4:51 PM, Kevin Wolf <kwolf@redhat.com> wrote:
Am 20.07.2011 15:25, schrieb Jes Sorensen:
On 07/20/11 12:01, Kevin Wolf wrote:
Right, we're stuck with the two horros of NFS and selinux, so we need something that gets around the problem. In a sane world we would simply say 'no NFS, no selinux', but as you say that will never happen.
My suggestion of a callback mechanism where libvirt registers the callback with QEMU for open() calls, allowing libvirt to perform the open and return the open file descriptor would get around this problem. To me this sounds more like a problem than a solution. It basically means that during an open (which may even be initiated by a monitor command), you need monitor interaction. It basically means that open becomes asynchronous, and requires clients to deal with that, which sounds at least "interesting"... Also you have to add some magic to all places opening something.
I think if libvirt wants qemu to use an fd instead of a file name, it shouldn't pass a file name but an fd in the first place. Which means that the two that we need are support for an fd: protocol (patches on the list, need review), and a way for libvirt to override the backing file of an image.
The problem is that QEMU will find backing file file names inside the images which it will be unable to open. How do you suggest we get around that?
This is the part with allowing libvirt to override the backing file. Of course, this is not something that we can add with five lines of code, it requires -blockdev.
There could still be some issues: Let's have files A, B, C etc. with backing files AA etc. How would libvirt know that when QEMU wants to write to file CA, this is because it's needed to access C, or is it just trickery by a devious guest to corrupt storage? This could be handled so that instead of naming the backing file, QEMU asks for a descriptor for the backing file by presenting the descriptor to main file C, but I think the real solution is that libvirt should handle the storage formats completely and it should present QEMU with only a raw file like interface (read/write/seek) for the data. Then any backing files would be handled within libvirt. Performance could suffer, though.

On 07/20/2011 11:20 AM, Blue Swirl wrote:
There could still be some issues: Let's have files A, B, C etc. with backing files AA etc. How would libvirt know that when QEMU wants to write to file CA, this is because it's needed to access C, or is it just trickery by a devious guest to corrupt storage?
The fix for CVE-2010-2238 already deals with this: if primary image C refers to backing file CA of raw format, but does not state what file format CA contains, then a malicious guest can modify the contents of CA to appear to be yet another qcow2 image. At which point, if libvirt follows the backing file specified in CA, then yes, the malicious guest really can cause libvirt to expose arbitrary file CB for manipulation by the guest. But that security hole was already plugged - by default, libvirt refuses to probe backing files parsed from qcow2 headers for file format, but instead requires the outer qcow2 header to also include the a file format designation for the backing file. At which point, you then have a safe chain: if C refers to CA, then libvirt knows that both C and CA are essential to the storage presented by giving qemu the file name C, and the guest will already be modifying CA, but there is no storage corruption involved. That is, as long as libvirt can already accurately read the chain of backing files from any starting point, then it can hand that entire chain of backing files (whether by the topmost file name as it does now, or whether by a series of fds as is being proposed) to qemu.
This could be handled so that instead of naming the backing file, QEMU asks for a descriptor for the backing file by presenting the descriptor to main file C, but I think the real solution is that libvirt should handle the storage formats completely and it should present QEMU with only a raw file like interface (read/write/seek) for the data. Then any backing files would be handled within libvirt. Performance could suffer, though.
The monitor interface was not designed to throw the read()/write()/seek() burden back on libvirt, and indeed that would kill performance so it is a non-starter idea. All we need for security is the open() burden to be shifted out of qemu and into libvirt. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On Wed, Jul 20, 2011 at 8:41 PM, Eric Blake <eblake@redhat.com> wrote:
On 07/20/2011 11:20 AM, Blue Swirl wrote:
There could still be some issues: Let's have files A, B, C etc. with backing files AA etc. How would libvirt know that when QEMU wants to write to file CA, this is because it's needed to access C, or is it just trickery by a devious guest to corrupt storage?
The fix for CVE-2010-2238 already deals with this: if primary image C refers to backing file CA of raw format, but does not state what file format CA contains, then a malicious guest can modify the contents of CA to appear to be yet another qcow2 image. At which point, if libvirt follows the backing file specified in CA, then yes, the malicious guest really can cause libvirt to expose arbitrary file CB for manipulation by the guest. But that security hole was already plugged - by default, libvirt refuses to probe backing files parsed from qcow2 headers for file format, but instead requires the outer qcow2 header to also include the a file format designation for the backing file. At which point, you then have a safe chain: if C refers to CA, then libvirt knows that both C and CA are essential to the storage presented by giving qemu the file name C, and the guest will already be modifying CA, but there is no storage corruption involved.
But what if CA is accessed even if C is not? For example, QEMU opens C (to determine CA and write new information about the path), closes it and then requests CA?
That is, as long as libvirt can already accurately read the chain of backing files from any starting point, then it can hand that entire chain of backing files (whether by the topmost file name as it does now, or whether by a series of fds as is being proposed) to qemu.
This could be handled so that instead of naming the backing file, QEMU asks for a descriptor for the backing file by presenting the descriptor to main file C, but I think the real solution is that libvirt should handle the storage formats completely and it should present QEMU with only a raw file like interface (read/write/seek) for the data. Then any backing files would be handled within libvirt. Performance could suffer, though.
The monitor interface was not designed to throw the read()/write()/seek() burden back on libvirt, and indeed that would kill performance so it is a non-starter idea. All we need for security is the open() burden to be shifted out of qemu and into libvirt.
Obviously the interface should be faster than monitor, for example a pair of sockets with some efficient protocol. Monitor could still be used to set up these.

On 07/20/2011 12:00 PM, Blue Swirl wrote:
Let's have files A, B, C etc. with backing files AA etc. How would libvirt know that when QEMU wants to write to file CA, this is because it's needed to access C, or is it just trickery by a devious guest to corrupt storage?
The fix for CVE-2010-2238 already deals with this: if primary image C refers to backing file CA of raw format, but does not state what file format CA contains, then a malicious guest can modify the contents of CA to appear to be yet another qcow2 image. At which point, if libvirt follows the backing file specified in CA, then yes, the malicious guest really can cause libvirt to expose arbitrary file CB for manipulation by the guest. But that security hole was already plugged - by default, libvirt refuses to probe backing files parsed from qcow2 headers for file format, but instead requires the outer qcow2 header to also include the a file format designation for the backing file. At which point, you then have a safe chain: if C refers to CA, then libvirt knows that both C and CA are essential to the storage presented by giving qemu the file name C, and the guest will already be modifying CA, but there is no storage corruption involved.
But what if CA is accessed even if C is not? For example, QEMU opens C (to determine CA and write new information about the path), closes it and then requests CA?
Why is qemu trying to access CA? Either because CA was mentioned as a backing file for C (in which case libvirt already knows about it, because either libvirt handed C to qemu at startup time after already parsing C's headers to learn that CA is a backing file, or because libvirt called the snapshot_blkdev command with the intent of having qemu populate CA with C as its backing file), or because qemu has a bug (in which case, libvirt should refuse the access to CA). Libvirt is already perfectly capable of tracking all files that qemu might need to access, and whether it is qemu or libvirt that does the open() of those files, we can still have libvirt validate whether each request for a file makes sense given the context of all previous files in use from the time the qemu command line was invoked and across all monitor commands in the meantime. On non-NFS solutions, where every file can have a SELinux label, then the security is then present by merely having libvirt relabel all such files to a unique label for that particular qemu process, and SELinux merely enforces that qemu cannot open() anything but what libvirt has already labeled. And since libvirt already knows which files to label in the non-NFS scenario, it already knows which fds to pass in the NFS scenario, at which point the ability to prevent qemu from open()ing an NFS file is a security enhancement. Your question about qemu wanting to use CA is thus answered independently of whether the fd management solution is solved by libvirt handing an fd for CA to qemu prior to any monitor command where qemu will then need to use CA, or whether qemu is taught to asynchronously ask libvirt to open an fd for CA on qemu's behalf. The answer is that libvirt already tracks whether qemu should access CA, and just needs a way to enforce that knowledge. The enforcement already exists for non-NFS via SELinux labels, and the proposal to add fd handling will expand that enforcement to also cover NFS. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On Wed, Jul 20, 2011 at 9:17 PM, Eric Blake <eblake@redhat.com> wrote:
On 07/20/2011 12:00 PM, Blue Swirl wrote:
Let's have files A, B, C etc. with backing files AA etc. How would libvirt know that when QEMU wants to write to file CA, this is because it's needed to access C, or is it just trickery by a devious guest to corrupt storage?
The fix for CVE-2010-2238 already deals with this: if primary image C refers to backing file CA of raw format, but does not state what file format CA contains, then a malicious guest can modify the contents of CA to appear to be yet another qcow2 image. At which point, if libvirt follows the backing file specified in CA, then yes, the malicious guest really can cause libvirt to expose arbitrary file CB for manipulation by the guest. But that security hole was already plugged - by default, libvirt refuses to probe backing files parsed from qcow2 headers for file format, but instead requires the outer qcow2 header to also include the a file format designation for the backing file. At which point, you then have a safe chain: if C refers to CA, then libvirt knows that both C and CA are essential to the storage presented by giving qemu the file name C, and the guest will already be modifying CA, but there is no storage corruption involved.
But what if CA is accessed even if C is not? For example, QEMU opens C (to determine CA and write new information about the path), closes it and then requests CA?
Why is qemu trying to access CA?
Either because CA was mentioned as a backing file for C (in which case libvirt already knows about it, because either libvirt handed C to qemu at startup time after already parsing C's headers to learn that CA is a backing file, or because libvirt called the snapshot_blkdev command with the intent of having qemu populate CA with C as its backing file), or because qemu has a bug (in which case, libvirt should refuse the access to CA).
So no new backing files can be introduced by QEMU after it has started without libvirt knowing it?
Libvirt is already perfectly capable of tracking all files that qemu might need to access, and whether it is qemu or libvirt that does the open() of those files, we can still have libvirt validate whether each request for a file makes sense given the context of all previous files in use from the time the qemu command line was invoked and across all monitor commands in the meantime.
On non-NFS solutions, where every file can have a SELinux label, then the security is then present by merely having libvirt relabel all such files to a unique label for that particular qemu process, and SELinux merely enforces that qemu cannot open() anything but what libvirt has already labeled. And since libvirt already knows which files to label in the non-NFS scenario, it already knows which fds to pass in the NFS scenario, at which point the ability to prevent qemu from open()ing an NFS file is a security enhancement.
Your question about qemu wanting to use CA is thus answered independently of whether the fd management solution is solved by libvirt handing an fd for CA to qemu prior to any monitor command where qemu will then need to use CA, or whether qemu is taught to asynchronously ask libvirt to open an fd for CA on qemu's behalf. The answer is that libvirt already tracks whether qemu should access CA, and just needs a way to enforce that knowledge. The enforcement already exists for non-NFS via SELinux labels, and the proposal to add fd handling will expand that enforcement to also cover NFS.
OK. I think fds would be useful internally in a privilege separation mode in plain QEMU too.

On 07/20/2011 02:01 PM, Blue Swirl wrote:
Either because CA was mentioned as a backing file for C (in which case libvirt already knows about it, because either libvirt handed C to qemu at startup time after already parsing C's headers to learn that CA is a backing file, or because libvirt called the snapshot_blkdev command with the intent of having qemu populate CA with C as its backing file), or because qemu has a bug (in which case, libvirt should refuse the access to CA).
So no new backing files can be introduced by QEMU after it has started without libvirt knowing it?
No, you missed my point. A new backing file can only be introduced by qemu after it has started by libvirt using a finite set of monitor commands. These include disk hotplug (libvirt adds to the list of files known to be accessed by qemu, by reading the image headers of the new disk to be hot-plugged prior to issuing the monitor command), by disk hot-unplug (libvirt revokes the access to the files making up that disk, which it remembers from before the disk was added), and snapshot_blkdev (libvirt is explicitly requesting a new qcow2 file with the old file as the backing image, so it knows the new relationship of files to be accessed by that block device). Since libvirt issued the monitor commands, libvirt always knows which files qemu should be trying to access when servicing those block devices to the guest.
OK. I think fds would be useful internally in a privilege separation mode in plain QEMU too.
Yes, there's more than one reason to add fd support to all possible situations where qemu is currently resorting to open(). -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

Am 20.07.2011 19:20, schrieb Blue Swirl:
On Wed, Jul 20, 2011 at 4:51 PM, Kevin Wolf <kwolf@redhat.com> wrote:
Am 20.07.2011 15:25, schrieb Jes Sorensen:
On 07/20/11 12:01, Kevin Wolf wrote:
Right, we're stuck with the two horros of NFS and selinux, so we need something that gets around the problem. In a sane world we would simply say 'no NFS, no selinux', but as you say that will never happen.
My suggestion of a callback mechanism where libvirt registers the callback with QEMU for open() calls, allowing libvirt to perform the open and return the open file descriptor would get around this problem. To me this sounds more like a problem than a solution. It basically means that during an open (which may even be initiated by a monitor command), you need monitor interaction. It basically means that open becomes asynchronous, and requires clients to deal with that, which sounds at least "interesting"... Also you have to add some magic to all places opening something.
I think if libvirt wants qemu to use an fd instead of a file name, it shouldn't pass a file name but an fd in the first place. Which means that the two that we need are support for an fd: protocol (patches on the list, need review), and a way for libvirt to override the backing file of an image.
The problem is that QEMU will find backing file file names inside the images which it will be unable to open. How do you suggest we get around that?
This is the part with allowing libvirt to override the backing file. Of course, this is not something that we can add with five lines of code, it requires -blockdev.
There could still be some issues: Let's have files A, B, C etc. with backing files AA etc. How would libvirt know that when QEMU wants to write to file CA, this is because it's needed to access C, or is it just trickery by a devious guest to corrupt storage?
This could be handled so that instead of naming the backing file, QEMU asks for a descriptor for the backing file by presenting the descriptor to main file C,
qemu shouldn't ask for anything. libvirt shouldn't give it a filename in the first place. It should do something like this: { "execute": "blockdev_add", "arguments"= { "driver": "fd," "fd": "4", "backing-file": { "driver": "fd," "fd": "5" } }} And then qemu doesn't even have a reason to know that there is something called CA.
but I think the real solution is that libvirt should handle the storage formats completely and it should present QEMU with only a raw file like interface (read/write/seek) for the data. Then any backing files would be handled within libvirt. Performance could suffer, though.
I like your humour. :-) Kevin

On Thu, Jul 21, 2011 at 11:25 AM, Kevin Wolf <kwolf@redhat.com> wrote:
Am 20.07.2011 19:20, schrieb Blue Swirl:
On Wed, Jul 20, 2011 at 4:51 PM, Kevin Wolf <kwolf@redhat.com> wrote:
Am 20.07.2011 15:25, schrieb Jes Sorensen:
On 07/20/11 12:01, Kevin Wolf wrote:
> Right, we're stuck with the two horros of NFS and selinux, so we need > something that gets around the problem. In a sane world we would simply > say 'no NFS, no selinux', but as you say that will never happen. > > My suggestion of a callback mechanism where libvirt registers the > callback with QEMU for open() calls, allowing libvirt to perform the > open and return the open file descriptor would get around this problem. To me this sounds more like a problem than a solution. It basically means that during an open (which may even be initiated by a monitor command), you need monitor interaction. It basically means that open becomes asynchronous, and requires clients to deal with that, which sounds at least "interesting"... Also you have to add some magic to all places opening something.
I think if libvirt wants qemu to use an fd instead of a file name, it shouldn't pass a file name but an fd in the first place. Which means that the two that we need are support for an fd: protocol (patches on the list, need review), and a way for libvirt to override the backing file of an image.
The problem is that QEMU will find backing file file names inside the images which it will be unable to open. How do you suggest we get around that?
This is the part with allowing libvirt to override the backing file. Of course, this is not something that we can add with five lines of code, it requires -blockdev.
There could still be some issues: Let's have files A, B, C etc. with backing files AA etc. How would libvirt know that when QEMU wants to write to file CA, this is because it's needed to access C, or is it just trickery by a devious guest to corrupt storage?
This could be handled so that instead of naming the backing file, QEMU asks for a descriptor for the backing file by presenting the descriptor to main file C,
qemu shouldn't ask for anything. libvirt shouldn't give it a filename in the first place. It should do something like this:
{ "execute": "blockdev_add", "arguments"= { "driver": "fd," "fd": "4", "backing-file": { "driver": "fd," "fd": "5" } }}
And then qemu doesn't even have a reason to know that there is something called CA.
Yes, that's better.
but I think the real solution is that libvirt should handle the storage formats completely and it should present QEMU with only a raw file like interface (read/write/seek) for the data. Then any backing files would be handled within libvirt. Performance could suffer, though.
I like your humour. :-)
Well, for some applications, security is more important than performance or convenience.

On 07/20/2011 04:51 PM, Kevin Wolf wrote:
The problem is that QEMU will find backing file file names inside the images which it will be unable to open. How do you suggest we get around that?
This is the part with allowing libvirt to override the backing file. Of course, this is not something that we can add with five lines of code, it requires -blockdev.
It can be done without blockdev. Have a dictionary that translates filenames, and populate it from the command line (for a bonus, translate a filename to a file descriptor inherited from the caller or passed via the monitor). -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.

Am 22.07.2011 09:36, schrieb Avi Kivity:
On 07/20/2011 04:51 PM, Kevin Wolf wrote:
The problem is that QEMU will find backing file file names inside the images which it will be unable to open. How do you suggest we get around that?
This is the part with allowing libvirt to override the backing file. Of course, this is not something that we can add with five lines of code, it requires -blockdev.
It can be done without blockdev. Have a dictionary that translates filenames, and populate it from the command line (for a bonus, translate a filename to a file descriptor inherited from the caller or passed via the monitor).
Sure, you can always add ugly hacks, but it isn't the right solution that we want to use for all times. However, once we use it, it will show up in the external API and we'll never get rid of it again. Kevin

On Fri, Jul 22, 2011 at 11:11 AM, Kevin Wolf <kwolf@redhat.com> wrote:
Am 22.07.2011 09:36, schrieb Avi Kivity:
On 07/20/2011 04:51 PM, Kevin Wolf wrote:
The problem is that QEMU will find backing file file names inside the images which it will be unable to open. How do you suggest we get around that?
This is the part with allowing libvirt to override the backing file. Of course, this is not something that we can add with five lines of code, it requires -blockdev.
It can be done without blockdev. Have a dictionary that translates filenames, and populate it from the command line (for a bonus, translate a filename to a file descriptor inherited from the caller or passed via the monitor).
Sure, you can always add ugly hacks, but it isn't the right solution that we want to use for all times. However, once we use it, it will show up in the external API and we'll never get rid of it again.
Fully agree. This would also be a highly specific API for QCOW2 and similar formats.

Jes Sorensen <Jes.Sorensen@redhat.com> writes:
On 07/19/11 18:14, Anthony Liguori wrote:
As nice as that sentiment is, it will never fly, because it would be a regression in current behavior. The whole reason that the virt_use_nfs SELinux bool exists is that some people are willing to make the partial security tradeoff. Besides, the use of sVirt via SELinux is more than just open() protection - while the current virt_use_nfs bool makes NFS less secure than otherwise possible, it still gives some nice guarantees to the rest of the qemu process such as passthrough accesses to local pci devices.
Well leaving things at status quo is not making it worse, it just leaves an evil in place.
NFS and SELinux is a fundamental problem with SELinux and NFS. We can piss and moan as much as we want about it but it's reality. SELinux fundamentally requires extended attributes. By the time NFS adds extended attribute support, we'll all be flying around in hover cars.
As terrible as NFS is, people use it all of the time.
It would be nice if libvirt had the ability to make better use of DAC to support isolation. The fact that MAC is the only way you can do isolation between guests is pretty unfortunate. If I could assign specific UIDs to a guest and use that to enforce isolation, it would go a long ways to solving this problem.
Right, we're stuck with the two horros of NFS and selinux, so we need something that gets around the problem. In a sane world we would simply say 'no NFS, no selinux', but as you say that will never happen.
My suggestion of a callback mechanism where libvirt registers the callback with QEMU for open() calls, allowing libvirt to perform the open and return the open file descriptor would get around this problem.
No go, I'm afraid. The problem at hand is how to confine the untrusted QEMU process so it can access exactly the resources the guest configuration specifies, no more, no less. When guest configuration says "use image A", it really means "file A plus certain other resources that are required to use it". For obvious usability reasons, we don't require spelling out "certain other" if we can safely get the information from A. Example: file foo.qcow2 requires its backing file file foo.img. Obviously, the code to get information from A must be trusted. Therefore, it can't run in the untrusted QEMU process. Example: the proposed callback mechanism. Guest configuration says "use image foo.qcow2". Libvirt (or whatever management layer) arranges that the QEMU process can use file foo.qcow2. The QEMU process then uses the callback to gain access to the backing file. Nothing stops it to ask for whatever file it wants. How can libvirt decide whether the file is one of the resources the guest configuration specifies? The only way I can see around having libvirt read resource information from images (preferably via a suitable library provided by QEMU) is to require the administrator to spell out the resouce information. Administrators generally don't fancy spelling out things the computer already knows.

On 07/21/11 15:07, Markus Armbruster wrote:
Jes Sorensen <Jes.Sorensen@redhat.com> writes:
Right, we're stuck with the two horros of NFS and selinux, so we need something that gets around the problem. In a sane world we would simply say 'no NFS, no selinux', but as you say that will never happen.
My suggestion of a callback mechanism where libvirt registers the callback with QEMU for open() calls, allowing libvirt to perform the open and return the open file descriptor would get around this problem.
No go, I'm afraid.
The problem at hand is how to confine the untrusted QEMU process so it can access exactly the resources the guest configuration specifies, no more, no less.
When guest configuration says "use image A", it really means "file A plus certain other resources that are required to use it". For obvious usability reasons, we don't require spelling out "certain other" if we can safely get the information from A.
Example: file foo.qcow2 requires its backing file file foo.img.
Obviously, the code to get information from A must be trusted. Therefore, it can't run in the untrusted QEMU process.
Example: the proposed callback mechanism.
Guest configuration says "use image foo.qcow2". Libvirt (or whatever management layer) arranges that the QEMU process can use file foo.qcow2.
The QEMU process then uses the callback to gain access to the backing file. Nothing stops it to ask for whatever file it wants. How can libvirt decide whether the file is one of the resources the guest configuration specifies?
The only way I can see around having libvirt read resource information from images (preferably via a suitable library provided by QEMU) is to require the administrator to spell out the resouce information. Administrators generally don't fancy spelling out things the computer already knows.
As I pointed out in other parts of the discussion, libvirt must carry a complete list of authorized files/devices that a guest is allowed to access. There is no way around this anyway, and since this is the case, it's not a showstopper for using the callback mechanism. Cheers, Jes

On Tue, Jul 19, 2011 at 04:30:19PM +0200, Jes Sorensen wrote:
On 07/19/11 16:24, Eric Blake wrote:
[adding the libvir-list] On 07/19/2011 08:09 AM, Jes Sorensen wrote:
Urgh, libvirt parsing image files is really unfortunate, it really doesn't give me warm fuzzy feelings :( libvirt really should not know about internals of image formats.
But even if you add new features to qemu to avoid needing this in the future, it doesn't change the past - libvirt will always have to know how to parse image files understood by older qemu, and so as long as libvirt already knows how to do that parsing, we might as well take advantage of it.
What has been done here in the past is plain wrong. Continuing to do it isn't the right thing to do here.
Besides, I feel that having a well-documented file format, so that independent applications can both parse the same file with the same semantics by obeying the file format specification, is a good design goal.
We all know that documentation is rarely uptodate, new features may not get added and libvirt will never be able to keep up. The driver for a file format belongs in QEMU and nowhere else.
This would be possible if QEMU to provide a libblockformat.so library which allowed apps to extract metadata from file formats using a stable API. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 07/19/11 18:47, Daniel P. Berrange wrote:
On Tue, Jul 19, 2011 at 04:30:19PM +0200, Jes Sorensen wrote:
On 07/19/11 16:24, Eric Blake wrote:
Besides, I feel that having a well-documented file format, so that independent applications can both parse the same file with the same semantics by obeying the file format specification, is a good design goal.
We all know that documentation is rarely uptodate, new features may not get added and libvirt will never be able to keep up. The driver for a file format belongs in QEMU and nowhere else.
This would be possible if QEMU to provide a libblockformat.so library which allowed apps to extract metadata from file formats using a stable API.
There is no reason for libvirt or any external process to mess about with the internals of image files. You have the same problem if the image file is encrypted. Jes

On Wed, Jul 20, 2011 at 10:26:49AM +0200, Jes Sorensen wrote:
On 07/19/11 18:47, Daniel P. Berrange wrote:
On Tue, Jul 19, 2011 at 04:30:19PM +0200, Jes Sorensen wrote:
On 07/19/11 16:24, Eric Blake wrote:
Besides, I feel that having a well-documented file format, so that independent applications can both parse the same file with the same semantics by obeying the file format specification, is a good design goal.
We all know that documentation is rarely uptodate, new features may not get added and libvirt will never be able to keep up. The driver for a file format belongs in QEMU and nowhere else.
This would be possible if QEMU to provide a libblockformat.so library which allowed apps to extract metadata from file formats using a stable API.
There is no reason for libvirt or any external process to mess about with the internals of image files. You have the same problem if the image file is encrypted.
Just repeating "libvirt doesn't need todo this" many times doesn't make it true. I have described why we need to read the disk image to determine its backing files ahead of QEMU being launched quite clearly. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 07/20/11 11:38, Daniel P. Berrange wrote:
On Wed, Jul 20, 2011 at 10:26:49AM +0200, Jes Sorensen wrote:
On 07/19/11 18:47, Daniel P. Berrange wrote:
On Tue, Jul 19, 2011 at 04:30:19PM +0200, Jes Sorensen wrote:
On 07/19/11 16:24, Eric Blake wrote:
Besides, I feel that having a well-documented file format, so that independent applications can both parse the same file with the same semantics by obeying the file format specification, is a good design goal.
We all know that documentation is rarely uptodate, new features may not get added and libvirt will never be able to keep up. The driver for a file format belongs in QEMU and nowhere else.
This would be possible if QEMU to provide a libblockformat.so library which allowed apps to extract metadata from file formats using a stable API.
There is no reason for libvirt or any external process to mess about with the internals of image files. You have the same problem if the image file is encrypted.
Just repeating "libvirt doesn't need todo this" many times doesn't make it true. I have described why we need to read the disk image to determine its backing files ahead of QEMU being launched quite clearly.
I have pointed out repeatedly why you do not need to do this. It is horrendous that libvirt didn't seek a proper solution to this problem before going on and implementing this current mess. Jes

On 07/19/2011 11:47 AM, Daniel P. Berrange wrote:
This would be possible if QEMU to provide a libblockformat.so library which allowed apps to extract metadata from file formats using a stable API.
I'm in 100% agreement that we need to provide the equivalent of a libblockformat.so down the road. But the block layer needs some work before we can support a stable interface. Regards, Anthony Liguori
Daniel

On 07/19/2011 11:47 AM, Daniel P. Berrange wrote:
On Tue, Jul 19, 2011 at 04:30:19PM +0200, Jes Sorensen wrote:
On 07/19/11 16:24, Eric Blake wrote:
[adding the libvir-list] On 07/19/2011 08:09 AM, Jes Sorensen wrote:
Urgh, libvirt parsing image files is really unfortunate, it really doesn't give me warm fuzzy feelings :( libvirt really should not know about internals of image formats.
But even if you add new features to qemu to avoid needing this in the future, it doesn't change the past - libvirt will always have to know how to parse image files understood by older qemu, and so as long as libvirt already knows how to do that parsing, we might as well take advantage of it.
What has been done here in the past is plain wrong. Continuing to do it isn't the right thing to do here.
Besides, I feel that having a well-documented file format, so that independent applications can both parse the same file with the same semantics by obeying the file format specification, is a good design goal.
We all know that documentation is rarely uptodate, new features may not get added and libvirt will never be able to keep up. The driver for a file format belongs in QEMU and nowhere else.
This would be possible if QEMU to provide a libblockformat.so library which allowed apps to extract metadata from file formats using a stable API.
How wrong would it be to call out to qemu-img to handle this instead? Seems like a more stable interface (assuming the output of `qemu-img info` is treated as an API of sorts, or perhaps some other output mode is added to qemu-img that is considered stable).
Daniel
participants (11)
-
Anthony Liguori
-
Avi Kivity
-
Blue Swirl
-
Daniel P. Berrange
-
Eric Blake
-
Jes Sorensen
-
Kevin Wolf
-
Markus Armbruster
-
Michael Roth
-
Nicolas Sebrecht
-
Stefan Hajnoczi