[libvirt] [PATCH] docs: add a mention for start a vm with rawio = 'yes'
When we start a vm which have rawio = 'yes' settings without any file caps settings for qemu, qemu process still cannot use this caps (CAP_SYS_RAWIO) and the /proc/pidofqemu/status like this: CapInh: 0000000000020000 CapPrm: 0000000000000000 CapEff: 0000000000000000 CapBnd: 0000001fffffffff this is because we do not set file caps for qemu (see man 7 capabilities), although laine have mentioned this in commit e11451, i think it will be good if we add this in docs. Signed-off-by: Luyao Huang <lhuang@redhat.com> --- docs/formatdomain.html.in | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index fb0a0d1..2bcb59d 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -1886,7 +1886,10 @@ than that (per-process basis, affects all the domain disks). To confine the capability as much as possible for QEMU driver as this stage, <code>sgio</code> is recommended, it's more - secure than <code>rawio</code>. + secure than <code>rawio</code>. If you really want use rawio + = 'yes', please also add file caps for qemu (like this + 'setcap "cap_sys_rawio+ie" /usr/libexec/qemu-kvm', for more details + please see capabilities(7)). </dd> <dt><code>sgio</code> attribute <span class="since">since 1.0.2</span></dt> -- 1.8.3.1
On Mon, Mar 02, 2015 at 06:04:44PM +0800, Luyao Huang wrote:
When we start a vm which have rawio = 'yes' settings without any file caps settings for qemu, qemu process still cannot use this caps (CAP_SYS_RAWIO) and the /proc/pidofqemu/status like this:
CapInh: 0000000000020000 CapPrm: 0000000000000000 CapEff: 0000000000000000 CapBnd: 0000001fffffffff
this is because we do not set file caps for qemu (see man 7 capabilities), although laine have mentioned this in commit e11451, i think it will be good if we add this in docs.
This is only true if you are starting the guest under the qemu:///session URI. In such a case I think it is expected that the QEMU lacks rawio capabilities, because the whole point of qemu:///session is that the VM has no elevated privileges. In the case of qemu:///system libvirt should ensure that it does the right thing with passing on raw io capability flag. If it does not, then we must fix that in the code, not the docs. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
On 03/02/2015 06:43 PM, Daniel P. Berrange wrote:
When we start a vm which have rawio = 'yes' settings without any file caps settings for qemu, qemu process still cannot use this caps (CAP_SYS_RAWIO) and the /proc/pidofqemu/status like this:
CapInh: 0000000000020000 CapPrm: 0000000000000000 CapEff: 0000000000000000 CapBnd: 0000001fffffffff
this is because we do not set file caps for qemu (see man 7 capabilities), although laine have mentioned this in commit e11451, i think it will be good if we add this in docs. This is only true if you are starting the guest under the qemu:///session URI. In such a case I think it is expected
On Mon, Mar 02, 2015 at 06:04:44PM +0800, Luyao Huang wrote: that the QEMU lacks rawio capabilities, because the whole point of qemu:///session is that the VM has no elevated privileges.
In the case of qemu:///system libvirt should ensure that it does the right thing with passing on raw io capability flag. If it does not, then we must fix that in the code, not the docs.
Hmm, what i show is the test result in qemu:///system, and we already set the right cap flag before we do execv() or execve(), however we run qemu process in qemu(107) not root(0) in most case, so only set this cap flags cannot make qemu to use this flag, because from capabilities(7): Transformation of capabilities during execve() During an execve(2), the kernel calculates the new capabilities of the process using the following algorithm: P'(permitted) = (P(inheritable) & F(inheritable)) | (F(permitted) & cap_bset) P'(effective) = F(effective) ? P'(permitted) : 0 P'(inheritable) = P(inheritable) [i.e., unchanged] where: P denotes the value of a thread capability set before the execve(2) P' denotes the value of a capability set after the execve(2) F denotes a file capability set cap_bset is the value of the capability bounding set (described below). So if not set any file cap to qemu program (/usr/libexec/qemu-kvm), the qemu process will get this cap flags: Uid: 107 107 107 107 Gid: 107 107 107 107 ... CapInh: 0000000000020000 CapPrm: 0000000000000000 CapEff: 0000000000000000 CapBnd: 0000001fffffffff and qemu process do not have this cap as the CapEff is for kernel do permission check. I think libvirt already do the right things here although running qemu process do not have rawio capability flag in this case, because i think it is not a good idea for libvirt to set file cap to qemu program, libvirt is not the only user which use or call qemu program, set a file cap to qemu program will affect other callers (although set a small file cap will not be a big deal :) ), so i guess maybe it is good to make the users to set this instead of libvirt use cap_set_file() to do this. BTW, if we make qemu process run with root(0) uid and gid, the cap flags will like this: ... Uid: 0 0 0 0 Gid: 0 0 0 0 ... CapInh: 0000000000020000 CapPrm: 0000000000020000 CapEff: 0000000000020000 CapBnd: 0000000000020000
Regards, Daniel
Thanks, Luyao
On 03/02/2015 05:43 AM, Daniel P. Berrange wrote:
When we start a vm which have rawio = 'yes' settings without any file caps settings for qemu, qemu process still cannot use this caps (CAP_SYS_RAWIO) and the /proc/pidofqemu/status like this:
CapInh: 0000000000020000 CapPrm: 0000000000000000 CapEff: 0000000000000000 CapBnd: 0000001fffffffff
this is because we do not set file caps for qemu (see man 7 capabilities), although laine have mentioned this in commit e11451, i think it will be good if we add this in docs. This is only true if you are starting the guest under the qemu:///session URI. In such a case I think it is expected
On Mon, Mar 02, 2015 at 06:04:44PM +0800, Luyao Huang wrote: that the QEMU lacks rawio capabilities, because the whole point of qemu:///session is that the VM has no elevated privileges.
In the case of qemu:///system libvirt should ensure that it does the right thing with passing on raw io capability flag. If it does not, then we must fix that in the code, not the docs.
libvirt does do the right thing as much as can be done. The commit Luyao references above has a summary of what I learned at the time I did this code (I don't remember who I verified that information with after my experiments failed to get the cap set, but it was somebody who understands capability bits better than me :-) Basically, the problem is that, in order for CapPrm and CapEff to have a bit set, the executable file (e.g. qemu-x86_64) itself must have that capability bit set. So libvirt's choices are: 1) require qemu to set the CAP_SYS_RAWIO bit on all their executables (NB: this is the only way hotplug of devices requiring CAP_SYS_RAWIO can work, since you can't *add* a capability to a process once it has been removed) (NB2: This is really beyond reasonable, since the vast majority of domains don't need CAP_SYS_RAWIO and it's not reasonable to give all of them a larger exposure to potential security problems just in case somebody someday might want to hotplug a scsi device with rawio) 2) keep track of how many active domains require CAP_SYS_RAWIO for each qemu binary, and set/clear that bit for the binary as required (still unacceptable - while we change the permissions/ownership of disk images and sockets, I don't think we should be changing the capability bits of system binaries in /usr/bin) 3) require the admin to set the CAP_SYS_RAWIO cap for the qemu binary if they are going to use it. In all cases, libvirt still needs to keep the CAP_SYS_RAWIO cap for the qemu processes that will actually use it, but this is something required *in addition to*, not instead of, setting the bit for the file. As ugly and inconvenient as it is, setting the cap bit on the qemu executable really is necessary to use CAP_SYS_RAWIO (unless my information + experiments were wrong), so I think it's reasonable to add this note (or something equivalent) to the documentation (I should have done so at the time, but as usual was thinking more about the code than about documenting what it did)
participants (4)
-
Daniel P. Berrange -
Laine Stump -
lhuang -
Luyao Huang