On Mon, Apr 4, 2011 at 11:47 AM, Daniel P. Berrange <berrange(a)redhat.com> wrote:
On Sun, Apr 03, 2011 at 07:06:17PM +0100, Stefan Hajnoczi wrote:
> On Sun, Apr 3, 2011 at 2:12 PM, Blue Swirl <blauwirbel(a)gmail.com> wrote:
> > On Sun, Apr 3, 2011 at 2:57 PM, Stefan Hajnoczi <stefanha(a)gmail.com>
wrote:
> >> On Tue, Mar 29, 2011 at 8:04 PM, Stefan Hajnoczi
> >> <stefanha(a)linux.vnet.ibm.com> wrote:
> >>> Piggy-back on the guest CD-ROM polling to poll on the host. Open and
> >>> close the host CD-ROM file descriptor to ensure we read the new size
and
> >>> not a stale size.
> >>>
> >>> Two things are going on here:
> >>>
> >>> 1. If hald/udisks is not already polling CD-ROMs on the host then
> >>> re-opening the CD-ROM causes the host to read the new medium's
size.
> >>>
> >>> 2. There is a bug in Linux which means the CD-ROM file descriptor must
> >>> be re-opened in order for lseek(2) to see the new size. The
> >>> inode size gets out of sync with the underlying device (which you can
> >>> confirm by checking that /sys/block/sr0/size and lseek(2) do not
> >>> match after media change). I have raised this with the
> >>> maintainers but we need a workaround for the foreseeable future.
> >>>
> >>> Note that these changes are all in a #ifdef __linux__ section.
> >>>
> >>> Signed-off-by: Stefan Hajnoczi <stefanha(a)linux.vnet.ibm.com>
> >>> ---
> >>> block/raw-posix.c | 26 ++++++++++++++++++++++----
> >>> 1 files changed, 22 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/block/raw-posix.c b/block/raw-posix.c
> >>> index 6b72470..8b5205c 100644
> >>> --- a/block/raw-posix.c
> >>> +++ b/block/raw-posix.c
> >>> @@ -1238,10 +1238,28 @@ static int cdrom_is_inserted(BlockDriverState
*bs)
> >>> BDRVRawState *s = bs->opaque;
> >>> int ret;
> >>>
> >>> - ret = ioctl(s->fd, CDROM_DRIVE_STATUS, CDSL_CURRENT);
> >>> - if (ret == CDS_DISC_OK)
> >>> - return 1;
> >>> - return 0;
> >>> + /*
> >>> + * Close the file descriptor if no medium is present and open it to
poll
> >>> + * again. This ensures the medium size is refreshed. If the file
> >>> + * descriptor is kept open the size can become stale. This is
essentially
> >>> + * replicating CD-ROM polling but is driven by the guest. As the
guest
> >>> + * polls, we poll the host.
> >>> + */
> >>> +
> >>> + if (s->fd == -1) {
> >>> + s->fd = qemu_open(bs->filename, s->open_flags, 0644);
> >>> + if (s->fd < 0) {
> >>> + return 0;
> >>> + }
> >>> + }
> >>> +
> >>> + ret = (ioctl(s->fd, CDROM_DRIVE_STATUS, CDSL_CURRENT) ==
CDS_DISC_OK);
> >>> +
> >>> + if (!ret) {
> >>> + close(s->fd);
> >>> + s->fd = -1;
> >>> + }
> >>> + return ret;
> >>> }
> >>>
> >>> static int cdrom_eject(BlockDriverState *bs, int eject_flag)
> >>> --
> >>> 1.7.4.1
> >>>
> >>>
> >>>
> >>
> >> There is an issue with reopening host devices in QEMU when running
> >> under libvirt. It appears that libvirt chowns image files (including
> >> device nodes) so that the launched QEMU process can access them.
> >>
> >> Unfortunately after media change on host devices udev will reset the
> >> ownership of the device node. This causes open(2) to fail with EACCES
> >> since the QEMU process does not have the right uid/gid/groups and
> >> libvirt is unaware that the file's ownership has changed.
> >>
> >> In order for media change to work with Linux host CD-ROM it is
> >> necessary to reopen the file (otherwise the inode size will not
> >> refresh, this is an issue with existing kernels).
> >>
> >> How can libvirt's security model be made to support this case? In
> >> theory udev could be temporarily configured with libvirt permissions
> >> for the CD-ROM device while passed through to the guest, but is that
> >> feasible?
> >
> > How about something like this: Add an explicit reopen method to
> > BlockDriver. Make a special block device for passed file descriptors.
> > Pass descriptors in libvirt for CD-ROMs instead of the device paths.
> > The reopen method for file descriptors should notify libvirt about
> > need to pass a reopened descriptor and then block all accesses until a
> > new descriptor is available. This should also solve your earlier
> > problem.
>
> I'm hoping libvirt's behavior can be made to just work rather than
> adding new features to QEMU. But perhaps passing file descriptors is
> useful for more than just reopening host devices. This would
> basically be a privilege separation model where the QEMU process isn't
> able to open files itself but can request libvirt to open them on its
> behalf.
It is rather frickin' annoying the way udev resets the ownership
when the media merely changes. If it isn't possible to stop udev
doing this, then i think the only practical thing is to use ACLs
instead of user/group ownership. We wanted to switch to ACLs in
libvirt for other reasons already, but it isn't quite as simple
as it sounds[1] so we've not done it just yet.
Daniel
[1] Mostly due to handling upgrades from existing libvirtd while
VMs are running, and coping with filesystems which don't
support ACLs (or have them turned of by mount options)
I haven't peeked at how udev does it but perhaps the ACLs will be lost
or reset across media change too?
Daniel, do you know someone from the udev side who we should include
in this discussion?
Stefan