On 03/02/2015 05:43 AM, Daniel P. Berrange wrote:
On Mon, Mar 02, 2015 at 06:04:44PM +0800, Luyao Huang wrote:
> When we start a vm which have rawio = 'yes' settings without
> any file caps settings for qemu, qemu process still cannot use
> this caps (CAP_SYS_RAWIO) and the /proc/pidofqemu/status like
> this:
>
> CapInh: 0000000000020000
> CapPrm: 0000000000000000
> CapEff: 0000000000000000
> CapBnd: 0000001fffffffff
>
> this is because we do not set file caps for qemu (see man 7
> capabilities), although laine have mentioned this in commit
> e11451, i think it will be good if we add this in docs.
This is only true if you are starting the guest under the
qemu:///session URI. In such a case I think it is expected
that the QEMU lacks rawio capabilities, because the whole
point of qemu:///session is that the VM has no elevated
privileges.
In the case of qemu:///system libvirt should ensure that
it does the right thing with passing on raw io capability
flag. If it does not, then we must fix that in the code,
not the docs.
libvirt does do the right thing as much as can be done. The commit Luyao
references above has a summary of what I learned at the time I did this
code (I don't remember who I verified that information with after my
experiments failed to get the cap set, but it was somebody who
understands capability bits better than me :-)
Basically, the problem is that, in order for CapPrm and CapEff to have a
bit set, the executable file (e.g. qemu-x86_64) itself must have that
capability bit set. So libvirt's choices are:
1) require qemu to set the CAP_SYS_RAWIO bit on all their executables
(NB: this is the only way hotplug of devices requiring CAP_SYS_RAWIO can
work, since you can't *add* a capability to a process once it has been
removed) (NB2: This is really beyond reasonable, since the vast majority
of domains don't need CAP_SYS_RAWIO and it's not reasonable to give all
of them a larger exposure to potential security problems just in case
somebody someday might want to hotplug a scsi device with rawio)
2) keep track of how many active domains require CAP_SYS_RAWIO for each
qemu binary, and set/clear that bit for the binary as required (still
unacceptable - while we change the permissions/ownership of disk images
and sockets, I don't think we should be changing the capability bits of
system binaries in /usr/bin)
3) require the admin to set the CAP_SYS_RAWIO cap for the qemu binary if
they are going to use it.
In all cases, libvirt still needs to keep the CAP_SYS_RAWIO cap for the
qemu processes that will actually use it, but this is something required
*in addition to*, not instead of, setting the bit for the file.
As ugly and inconvenient as it is, setting the cap bit on the qemu
executable really is necessary to use CAP_SYS_RAWIO (unless my
information + experiments were wrong), so I think it's reasonable to add
this note (or something equivalent) to the documentation (I should have
done so at the time, but as usual was thinking more about the code than
about documenting what it did)