[libvirt] [PATCH 0/5] qemu: invoke qemu-bridge-helper from libvirtd

The <interface type='bridge'> is working mostly because of a bad design decision in Linux. Ideally, QEMU would run with an empty capability bounding set and would not be able to do any privileged operation (not even by running a helper program). This is not the case because dropping capabilities from the bounding set requires a capability of its own, CAP_SETPCAP; thus QEMU does *not* run with an empty bounding set if invoked via qemu:///session. This series lets libvirtd invoke the privileged helper program on its own, which is a cleaner design that would work even if the above Linux bug was not there. Also, this adds a <target dev='tap0'/> element to the XML of an active domain using <interface type='bridge'>. libvirt now needs to know the path to the helper (patch 3), and must not set permitted/effective capabilities on children when running unprivileged (patches 1/2). Apart from this, the recvfd and virCommand APIs make the task almost trivial. Paolo Bonzini (5): util: simplify virSetUIDGIDWithCaps util: allow using virCommandAllowCap with setuid helpers qemu_conf: add new configuration key bridge_helper virnetdevtap: add virNetDevTapGetName qemu: launch bridge helper from libvirtd src/libvirt_private.syms | 1 + src/qemu/libvirtd_qemu.aug | 1 + src/qemu/qemu.conf | 8 +++ src/qemu/qemu_command.c | 133 +++++++++++++++++++++++++++---------- src/qemu/qemu_command.h | 1 - src/qemu/qemu_conf.c | 3 + src/qemu/qemu_conf.h | 1 + src/qemu/qemu_hotplug.c | 25 +++---- src/qemu/test_libvirtd_qemu.aug.in | 1 + src/util/virnetdevtap.c | 33 +++++++++ src/util/virnetdevtap.h | 3 + src/util/virutil.c | 36 +++++++--- 12 files changed, 183 insertions(+), 63 deletions(-) -- 1.8.1.4

The need_prctl variable is not really needed. If it is false, capng_apply will be called twice with the same set, causing a little extra work but no problem. This keeps the code a bit simpler. It is also clearer to invoke capng_apply(CAPNG_SELECT_BOUNDS) separately, to make sure it is done while we have CAP_SETPCAP. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- src/util/virutil.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/src/util/virutil.c b/src/util/virutil.c index 42b4295..36e6cee 100644 --- a/src/util/virutil.c +++ b/src/util/virutil.c @@ -3003,7 +3003,7 @@ virSetUIDGIDWithCaps(uid_t uid, gid_t gid, unsigned long long capBits, { int ii, capng_ret, ret = -1; bool need_setgid = false, need_setuid = false; - bool need_prctl = false, need_setpcap = false; + bool need_setpcap = false; /* First drop all caps (unless the requested uid is "unchanged" or * root and clearExistingCaps wasn't requested), then add back @@ -3043,17 +3043,15 @@ virSetUIDGIDWithCaps(uid_t uid, gid_t gid, unsigned long long capBits, capng_update(CAPNG_ADD, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_SETPCAP); # endif - need_prctl = capBits || need_setgid || need_setuid || need_setpcap; - /* Tell system we want to keep caps across uid change */ - if (need_prctl && prctl(PR_SET_KEEPCAPS, 1, 0, 0, 0)) { + if (prctl(PR_SET_KEEPCAPS, 1, 0, 0, 0)) { virReportSystemError(errno, "%s", _("prctl failed to set KEEPCAPS")); goto cleanup; } /* Change to the temp capabilities */ - if ((capng_ret = capng_apply(CAPNG_SELECT_BOTH)) < 0) { + if ((capng_ret = capng_apply(CAPNG_SELECT_CAPS)) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("cannot apply process capabilities %d"), capng_ret); goto cleanup; @@ -3063,12 +3061,18 @@ virSetUIDGIDWithCaps(uid_t uid, gid_t gid, unsigned long long capBits, goto cleanup; /* Tell it we are done keeping capabilities */ - if (need_prctl && prctl(PR_SET_KEEPCAPS, 0, 0, 0, 0)) { + if (prctl(PR_SET_KEEPCAPS, 0, 0, 0, 0)) { virReportSystemError(errno, "%s", _("prctl failed to reset KEEPCAPS")); goto cleanup; } + /* Set bounding set while we have CAP_SETPCAP. Unfortunately we cannot + * do this if we failed to get the capability above, so ignore the + * return value. + */ + capng_apply(CAPNG_SELECT_BOUNDS); + /* Drop the caps that allow setuid/gid (unless they were requested) */ if (need_setgid) capng_update(CAPNG_DROP, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_SETGID); @@ -3078,7 +3082,7 @@ virSetUIDGIDWithCaps(uid_t uid, gid_t gid, unsigned long long capBits, if (need_setpcap) capng_update(CAPNG_DROP, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_SETPCAP); - if (need_prctl && ((capng_ret = capng_apply(CAPNG_SELECT_BOTH)) < 0)) { + if (((capng_ret = capng_apply(CAPNG_SELECT_CAPS)) < 0)) { virReportError(VIR_ERR_INTERNAL_ERROR, _("cannot apply process capabilities %d"), capng_ret); ret = -1; -- 1.8.1.4

On 03/25/2013 08:25 AM, Paolo Bonzini wrote:
The need_prctl variable is not really needed. If it is false, capng_apply will be called twice with the same set, causing a little extra work but no problem. This keeps the code a bit simpler.
It is also clearer to invoke capng_apply(CAPNG_SELECT_BOUNDS) separately, to make sure it is done while we have CAP_SETPCAP.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- src/util/virutil.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-)
/* Change to the temp capabilities */ - if ((capng_ret = capng_apply(CAPNG_SELECT_BOTH)) < 0) { + if ((capng_ret = capng_apply(CAPNG_SELECT_CAPS)) < 0) {
Beforehand, we limited both caps and bounding set, with an overlarge set, now you are limiting just caps...
virReportError(VIR_ERR_INTERNAL_ERROR, _("cannot apply process capabilities %d"), capng_ret); goto cleanup; @@ -3063,12 +3061,18 @@ virSetUIDGIDWithCaps(uid_t uid, gid_t gid, unsigned long long capBits, goto cleanup;
/* Tell it we are done keeping capabilities */ - if (need_prctl && prctl(PR_SET_KEEPCAPS, 0, 0, 0, 0)) { + if (prctl(PR_SET_KEEPCAPS, 0, 0, 0, 0)) { virReportSystemError(errno, "%s", _("prctl failed to reset KEEPCAPS")); goto cleanup; }
+ /* Set bounding set while we have CAP_SETPCAP. Unfortunately we cannot + * do this if we failed to get the capability above, so ignore the + * return value. + */ + capng_apply(CAPNG_SELECT_BOUNDS);
...and then separately limiting bounds, but still while having an overlarge set.
+ /* Drop the caps that allow setuid/gid (unless they were requested) */ if (need_setgid) capng_update(CAPNG_DROP, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_SETGID); @@ -3078,7 +3082,7 @@ virSetUIDGIDWithCaps(uid_t uid, gid_t gid, unsigned long long capBits, if (need_setpcap) capng_update(CAPNG_DROP, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_SETPCAP);
Here, the set is now pruned to size...
- if (need_prctl && ((capng_ret = capng_apply(CAPNG_SELECT_BOTH)) < 0)) { + if (((capng_ret = capng_apply(CAPNG_SELECT_CAPS)) < 0)) {
...but you are now only limiting caps, not the bounding set. Is that correct? Does this need to be considered for 1.0.4, or can we delay it to post-release? -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Wed, Mar 27, 2013 at 04:36:38PM -0600, Eric Blake wrote:
On 03/25/2013 08:25 AM, Paolo Bonzini wrote:
The need_prctl variable is not really needed. If it is false, capng_apply will be called twice with the same set, causing a little extra work but no problem. This keeps the code a bit simpler.
It is also clearer to invoke capng_apply(CAPNG_SELECT_BOUNDS) separately, to make sure it is done while we have CAP_SETPCAP.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- src/util/virutil.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-)
/* Change to the temp capabilities */ - if ((capng_ret = capng_apply(CAPNG_SELECT_BOTH)) < 0) { + if ((capng_ret = capng_apply(CAPNG_SELECT_CAPS)) < 0) {
Beforehand, we limited both caps and bounding set, with an overlarge set, now you are limiting just caps...
virReportError(VIR_ERR_INTERNAL_ERROR, _("cannot apply process capabilities %d"), capng_ret); goto cleanup; @@ -3063,12 +3061,18 @@ virSetUIDGIDWithCaps(uid_t uid, gid_t gid, unsigned long long capBits, goto cleanup;
/* Tell it we are done keeping capabilities */ - if (need_prctl && prctl(PR_SET_KEEPCAPS, 0, 0, 0, 0)) { + if (prctl(PR_SET_KEEPCAPS, 0, 0, 0, 0)) { virReportSystemError(errno, "%s", _("prctl failed to reset KEEPCAPS")); goto cleanup; }
+ /* Set bounding set while we have CAP_SETPCAP. Unfortunately we cannot + * do this if we failed to get the capability above, so ignore the + * return value. + */ + capng_apply(CAPNG_SELECT_BOUNDS);
...and then separately limiting bounds, but still while having an overlarge set.
+ /* Drop the caps that allow setuid/gid (unless they were requested) */ if (need_setgid) capng_update(CAPNG_DROP, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_SETGID); @@ -3078,7 +3082,7 @@ virSetUIDGIDWithCaps(uid_t uid, gid_t gid, unsigned long long capBits, if (need_setpcap) capng_update(CAPNG_DROP, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_SETPCAP);
Here, the set is now pruned to size...
- if (need_prctl && ((capng_ret = capng_apply(CAPNG_SELECT_BOTH)) < 0)) { + if (((capng_ret = capng_apply(CAPNG_SELECT_CAPS)) < 0)) {
...but you are now only limiting caps, not the bounding set. Is that correct?
This method is derived from code in libcap-ng capng_change_id. Paulo's changes actually make the libvirt code closer to what capng_change_id does, so I think it is OK.
Does this need to be considered for 1.0.4, or can we delay it to post-release?
I think we can wait for this whole series - it is really feature work rather than critical bugfix. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

/* Change to the temp capabilities */ - if ((capng_ret = capng_apply(CAPNG_SELECT_BOTH)) < 0) { + if ((capng_ret = capng_apply(CAPNG_SELECT_CAPS)) < 0) {
Beforehand, we limited both caps and bounding set, with an overlarge set, now you are limiting just caps...
virReportError(VIR_ERR_INTERNAL_ERROR, _("cannot apply process capabilities %d"), capng_ret); goto cleanup; @@ -3063,12 +3061,18 @@ virSetUIDGIDWithCaps(uid_t uid, gid_t gid, unsigned long long capBits, goto cleanup;
/* Tell it we are done keeping capabilities */ - if (need_prctl && prctl(PR_SET_KEEPCAPS, 0, 0, 0, 0)) { + if (prctl(PR_SET_KEEPCAPS, 0, 0, 0, 0)) { virReportSystemError(errno, "%s", _("prctl failed to reset KEEPCAPS")); goto cleanup; }
+ /* Set bounding set while we have CAP_SETPCAP. Unfortunately we cannot + * do this if we failed to get the capability above, so ignore the + * return value. + */ + capng_apply(CAPNG_SELECT_BOUNDS);
...and then separately limiting bounds, but still while having an overlarge set.
+ /* Drop the caps that allow setuid/gid (unless they were requested) */ if (need_setgid) capng_update(CAPNG_DROP, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_SETGID); @@ -3078,7 +3082,7 @@ virSetUIDGIDWithCaps(uid_t uid, gid_t gid, unsigned long long capBits, if (need_setpcap) capng_update(CAPNG_DROP, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_SETPCAP);
Here, the set is now pruned to size...
- if (need_prctl && ((capng_ret = capng_apply(CAPNG_SELECT_BOTH)) < 0)) { + if (((capng_ret = capng_apply(CAPNG_SELECT_CAPS)) < 0)) {
...but you are now only limiting caps, not the bounding set. Is that correct?
Yes, the code after capng(CAPNG_SELECT_BOUNDS) does not affect the bounding set.
Does this need to be considered for 1.0.4, or can we delay it to post-release?
Post-release, absolutely. Paolo

On 03/28/2013 04:04 AM, Paolo Bonzini wrote:
/* Change to the temp capabilities */ - if ((capng_ret = capng_apply(CAPNG_SELECT_BOTH)) < 0) { + if ((capng_ret = capng_apply(CAPNG_SELECT_CAPS)) < 0) {
Beforehand, we limited both caps and bounding set, with an overlarge set, now you are limiting just caps...
+ /* Set bounding set while we have CAP_SETPCAP. Unfortunately we cannot + * do this if we failed to get the capability above, so ignore the + * return value. + */ + capng_apply(CAPNG_SELECT_BOUNDS);
...and then separately limiting bounds, but still while having an overlarge set.
capng_update(CAPNG_DROP, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_SETPCAP);
Here, the set is now pruned to size...
- if (need_prctl && ((capng_ret = capng_apply(CAPNG_SELECT_BOTH)) < 0)) { + if (((capng_ret = capng_apply(CAPNG_SELECT_CAPS)) < 0)) {
...but you are now only limiting caps, not the bounding set. Is that correct?
Yes, the code after capng(CAPNG_SELECT_BOUNDS) does not affect the bounding set.
Ah, I see now - the CAPNG_* flags to the second parameter of capng_update did not alter CAPNG_BOUNDING_SET. ACK; I've gone ahead and pushed this patch. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

When running unprivileged, virSetUIDGIDWithCaps will fail because it tries to add the requested capabilities to the permitted and effective sets. Detect this case, and invoke the child with cleared permitted and effective sets. If it is a setuid program, it will get them. Some care is needed also because you cannot drop capabilities from the bounding set without CAP_SETPCAP. Because of that, ignore errors from setting the bounding set. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- src/util/virutil.c | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/src/util/virutil.c b/src/util/virutil.c index 36e6cee..8104e89 100644 --- a/src/util/virutil.c +++ b/src/util/virutil.c @@ -3052,9 +3052,21 @@ virSetUIDGIDWithCaps(uid_t uid, gid_t gid, unsigned long long capBits, /* Change to the temp capabilities */ if ((capng_ret = capng_apply(CAPNG_SELECT_CAPS)) < 0) { - virReportError(VIR_ERR_INTERNAL_ERROR, - _("cannot apply process capabilities %d"), capng_ret); - goto cleanup; + /* Failed. If we are running unprivileged, and the arguments make sense + * for this scenario, assume we're starting some kind of setuid helper: + * do not set any of capBits in the permitted or effective sets, and let + * the program get them on its own. + * + * (Too bad we cannot restrict the bounding set to the capabilities we + * would like the helper to have!). + */ + if (getuid() > 0 && clearExistingCaps && !need_setuid && !need_setgid) { + capng_clear(CAPNG_SELECT_CAPS); + } else { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("cannot apply process capabilities %d"), capng_ret); + goto cleanup; + } } if (virSetUIDGID(uid, gid) < 0) -- 1.8.1.4

On 03/25/2013 08:25 AM, Paolo Bonzini wrote:
When running unprivileged, virSetUIDGIDWithCaps will fail because it tries to add the requested capabilities to the permitted and effective sets.
Detect this case, and invoke the child with cleared permitted and effective sets. If it is a setuid program, it will get them.
Some care is needed also because you cannot drop capabilities from the bounding set without CAP_SETPCAP. Because of that, ignore errors from setting the bounding set.
That seems like a kernel flaw - it makes sense that you can't _add_ capabilities without CAP_SETPCAP, but being unable to _drop_ capabilities without first acquiring a capability seems backwards. I wonder if lkml would accept a patch that makes CAP_SETPCAP unnecessary for the restriction case, and only require it for the case of gaining capabilities. But regardless of that thought experiment, I agree that the behavior is what it is in released kernels, so we must deal with it.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- src/util/virutil.c | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-)
diff --git a/src/util/virutil.c b/src/util/virutil.c index 36e6cee..8104e89 100644 --- a/src/util/virutil.c +++ b/src/util/virutil.c @@ -3052,9 +3052,21 @@ virSetUIDGIDWithCaps(uid_t uid, gid_t gid, unsigned long long capBits,
/* Change to the temp capabilities */ if ((capng_ret = capng_apply(CAPNG_SELECT_CAPS)) < 0) { - virReportError(VIR_ERR_INTERNAL_ERROR, - _("cannot apply process capabilities %d"), capng_ret); - goto cleanup; + /* Failed. If we are running unprivileged, and the arguments make sense + * for this scenario, assume we're starting some kind of setuid helper: + * do not set any of capBits in the permitted or effective sets, and let + * the program get them on its own. + * + * (Too bad we cannot restrict the bounding set to the capabilities we + * would like the helper to have!). + */ + if (getuid() > 0 && clearExistingCaps && !need_setuid && !need_setgid) { + capng_clear(CAPNG_SELECT_CAPS); + } else { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("cannot apply process capabilities %d"), capng_ret); + goto cleanup; + }
Hmm, this seems like we may want it for 1.0.4, but I still had questions on 1/5 which needs to be applied first. As written, the patch makes sense. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

That seems like a kernel flaw - it makes sense that you can't _add_ capabilities without CAP_SETPCAP, but being unable to _drop_ capabilities without first acquiring a capability seems backwards.
You cannot add capabilities to the bounding set at all. It is a one-way street. /me learned a lot of things while writing these two patches. In fact, capng_apply(CAPNG_SELECT_BOUNDS) will never fail, but I preferred to be conservative in patch 1 just in case this changes in the future.
Hmm, this seems like we may want it for 1.0.4
I do not think so, there should not be any cases right now where unprivileged libvirt calls a setuid helper. Paolo

On 03/27/2013 04:46 PM, Eric Blake wrote:
On 03/25/2013 08:25 AM, Paolo Bonzini wrote:
When running unprivileged, virSetUIDGIDWithCaps will fail because it tries to add the requested capabilities to the permitted and effective sets.
Detect this case, and invoke the child with cleared permitted and effective sets. If it is a setuid program, it will get them.
Some care is needed also because you cannot drop capabilities from the bounding set without CAP_SETPCAP. Because of that, ignore errors from setting the bounding set.
As written, the patch makes sense.
ACK and pushed. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Il 27/03/2013 23:46, Eric Blake ha scritto:
That seems like a kernel flaw - it makes sense that you can't _add_ capabilities without CAP_SETPCAP, but being unable to _drop_ capabilities without first acquiring a capability seems backwards. I wonder if lkml would accept a patch that makes CAP_SETPCAP unnecessary for the restriction case, and only require it for the case of gaining capabilities.
The worry here is that dropping _some_ caps but not all lets you exploit untested error paths in suid binaries. The solution could be to install libvirtd as suid-root and drop all capabilities except CAP_SETPCAP when running unprivileged. Alternatively, you could use file capabilities to the same effect. Paolo -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJRcPGaAAoJEBvWZb6bTYbyYOQP/A8pZ1uDgwpeF23KMQdg2Rnl 5tt064KyjVs5NhQC8ntC1vulRQnEr0/qZt2NA7t4RiF3H2X9DVH5QmxMl6ELjxJI /GjScuswYOP5cmdj1xHKAJdmEV76C5f5NDra1CSw9iooWOdvrYLBgEqIEEp89i2s PKWN2GUbMesdFDgxf7mYaSqGYiIUI4UvON6bsPqJsuShyRvMsqffP+OQHbN7qNE3 O6TecxkpDFEnPJxOGKmemvLPDzTywPABaCJwozonB/xKzqWZ4w1EDB4czAlbhqVR zxtbWXFKJmZ1D72wgyTLeXtstJJnnh6DG9eHxGRf04+jVRGtMk7kOgZ8FE+WrsZQ VoAh8nAUI3kaZ9DfNPNuZ7Y7oT0venUWu5tCjnnO1hEMtFeSIfnP4h/BcV95wt+B I7jYco0pxCkBvWrwEXGkgHxMr+LMDav6nxo2ES5ToJUUGUAH6J8yPF7XYzppJ92k pFXueVMEIDgKZzRcuUyaaaDeZj8XQvGUN9I4s2x0CbRdQpOD+wdp0t10xoDXXeVw 1Sngs/yLptZqG5FY0g7Vt8Tnc21VrVShi2/dkacybjMCfHostQCzyLUfKR+4rQI1 oXvajO5lHpgnETLLdfo8Udjy90uWas09Hh940TKtA1dX6h8FVFyvplwm41HyMxnS oHZNxdw01qc/OmpoGYuP =Ob6t -----END PGP SIGNATURE-----

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- src/qemu/libvirtd_qemu.aug | 1 + src/qemu/qemu.conf | 8 ++++++++ src/qemu/qemu_conf.c | 3 +++ src/qemu/qemu_conf.h | 1 + src/qemu/test_libvirtd_qemu.aug.in | 1 + 5 files changed, 14 insertions(+) diff --git a/src/qemu/libvirtd_qemu.aug b/src/qemu/libvirtd_qemu.aug index 91f5f77..61740a9 100644 --- a/src/qemu/libvirtd_qemu.aug +++ b/src/qemu/libvirtd_qemu.aug @@ -60,6 +60,7 @@ module Libvirtd_qemu = let process_entry = str_entry "hugetlbfs_mount" | bool_entry "clear_emulator_capabilities" + | str_entry "bridge_helper" | bool_entry "set_process_name" | int_entry "max_processes" | int_entry "max_files" diff --git a/src/qemu/qemu.conf b/src/qemu/qemu.conf index dd853c8..87bdf70 100644 --- a/src/qemu/qemu.conf +++ b/src/qemu/qemu.conf @@ -301,6 +301,14 @@ #hugetlbfs_mount = "/dev/hugepages" +# Path to the setuid helper for creating tap devices. This executable +# is used to create <source type='bridge'> interfaces when libvirtd is +# running unprivileged. libvirt invokes the helper directly, instead +# of using "-netdev bridge", for security reasons. +#bridge_helper = "/usr/libexec/qemu-bridge-helper" + + + # If clear_emulator_capabilities is enabled, libvirt will drop all # privileged capabilities of the QEmu/KVM emulator. This is enabled by # default. diff --git a/src/qemu/qemu_conf.c b/src/qemu/qemu_conf.c index c2e2e10..f648fc3 100644 --- a/src/qemu/qemu_conf.c +++ b/src/qemu/qemu_conf.c @@ -248,6 +248,7 @@ virQEMUDriverConfigPtr virQEMUDriverConfigNew(bool privileged) } } #endif + cfg->bridgeHelperName = strdup("/usr/libexec/qemu-bridge-helper"); cfg->clearEmulatorCapabilities = true; @@ -297,6 +298,7 @@ static void virQEMUDriverConfigDispose(void *obj) VIR_FREE(cfg->hugetlbfsMount); VIR_FREE(cfg->hugepagePath); + VIR_FREE(cfg->bridgeHelperName); VIR_FREE(cfg->saveImageFormat); VIR_FREE(cfg->dumpImageFormat); @@ -509,6 +511,7 @@ int virQEMUDriverConfigLoadFile(virQEMUDriverConfigPtr cfg, GET_VALUE_BOOL("auto_start_bypass_cache", cfg->autoStartBypassCache); GET_VALUE_STR("hugetlbfs_mount", cfg->hugetlbfsMount); + GET_VALUE_STR("bridge_helper", cfg->bridgeHelperName); GET_VALUE_BOOL("mac_filter", cfg->macFilter); diff --git a/src/qemu/qemu_conf.h b/src/qemu/qemu_conf.h index c5ddaad..666ac33 100644 --- a/src/qemu/qemu_conf.h +++ b/src/qemu/qemu_conf.h @@ -117,6 +117,7 @@ struct _virQEMUDriverConfig { char *hugetlbfsMount; char *hugepagePath; + char *bridgeHelperName; bool macFilter; diff --git a/src/qemu/test_libvirtd_qemu.aug.in b/src/qemu/test_libvirtd_qemu.aug.in index 2892204..0aec997 100644 --- a/src/qemu/test_libvirtd_qemu.aug.in +++ b/src/qemu/test_libvirtd_qemu.aug.in @@ -49,6 +49,7 @@ module Test_libvirtd_qemu = { "auto_dump_bypass_cache" = "0" } { "auto_start_bypass_cache" = "0" } { "hugetlbfs_mount" = "/dev/hugepages" } +{ "bridge_helper" = "/usr/libexec/qemu-bridge-helper" } { "clear_emulator_capabilities" = "1" } { "set_process_name" = "1" } { "max_processes" = "0" } -- 1.8.1.4

On 03/25/2013 10:25 AM, Paolo Bonzini wrote:
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- src/qemu/libvirtd_qemu.aug | 1 + src/qemu/qemu.conf | 8 ++++++++ src/qemu/qemu_conf.c | 3 +++ src/qemu/qemu_conf.h | 1 + src/qemu/test_libvirtd_qemu.aug.in | 1 + 5 files changed, 14 insertions(+)
diff --git a/src/qemu/libvirtd_qemu.aug b/src/qemu/libvirtd_qemu.aug index 91f5f77..61740a9 100644 --- a/src/qemu/libvirtd_qemu.aug +++ b/src/qemu/libvirtd_qemu.aug @@ -60,6 +60,7 @@ module Libvirtd_qemu =
let process_entry = str_entry "hugetlbfs_mount" | bool_entry "clear_emulator_capabilities" + | str_entry "bridge_helper" | bool_entry "set_process_name" | int_entry "max_processes" | int_entry "max_files" diff --git a/src/qemu/qemu.conf b/src/qemu/qemu.conf index dd853c8..87bdf70 100644 --- a/src/qemu/qemu.conf +++ b/src/qemu/qemu.conf @@ -301,6 +301,14 @@ #hugetlbfs_mount = "/dev/hugepages"
+# Path to the setuid helper for creating tap devices. This executable +# is used to create <source type='bridge'> interfaces when libvirtd is +# running unprivileged. libvirt invokes the helper directly, instead +# of using "-netdev bridge", for security reasons. +#bridge_helper = "/usr/libexec/qemu-bridge-helper" + +
Are we sure we want to allow this to be configured? That could lead to some "interesting" troubleshooting incidents :-) On the other hand, I guess the path to qemu itself is right there in the domain config file, so how much worse could this be...
+ # If clear_emulator_capabilities is enabled, libvirt will drop all # privileged capabilities of the QEmu/KVM emulator. This is enabled by # default. diff --git a/src/qemu/qemu_conf.c b/src/qemu/qemu_conf.c index c2e2e10..f648fc3 100644 --- a/src/qemu/qemu_conf.c +++ b/src/qemu/qemu_conf.c @@ -248,6 +248,7 @@ virQEMUDriverConfigPtr virQEMUDriverConfigNew(bool privileged) } } #endif + cfg->bridgeHelperName = strdup("/usr/libexec/qemu-bridge-helper");
cfg->clearEmulatorCapabilities = true;
@@ -297,6 +298,7 @@ static void virQEMUDriverConfigDispose(void *obj)
VIR_FREE(cfg->hugetlbfsMount); VIR_FREE(cfg->hugepagePath); + VIR_FREE(cfg->bridgeHelperName);
VIR_FREE(cfg->saveImageFormat); VIR_FREE(cfg->dumpImageFormat); @@ -509,6 +511,7 @@ int virQEMUDriverConfigLoadFile(virQEMUDriverConfigPtr cfg, GET_VALUE_BOOL("auto_start_bypass_cache", cfg->autoStartBypassCache);
GET_VALUE_STR("hugetlbfs_mount", cfg->hugetlbfsMount); + GET_VALUE_STR("bridge_helper", cfg->bridgeHelperName);
GET_VALUE_BOOL("mac_filter", cfg->macFilter);
diff --git a/src/qemu/qemu_conf.h b/src/qemu/qemu_conf.h index c5ddaad..666ac33 100644 --- a/src/qemu/qemu_conf.h +++ b/src/qemu/qemu_conf.h @@ -117,6 +117,7 @@ struct _virQEMUDriverConfig {
char *hugetlbfsMount; char *hugepagePath; + char *bridgeHelperName;
bool macFilter;
diff --git a/src/qemu/test_libvirtd_qemu.aug.in b/src/qemu/test_libvirtd_qemu.aug.in index 2892204..0aec997 100644 --- a/src/qemu/test_libvirtd_qemu.aug.in +++ b/src/qemu/test_libvirtd_qemu.aug.in @@ -49,6 +49,7 @@ module Test_libvirtd_qemu = { "auto_dump_bypass_cache" = "0" } { "auto_start_bypass_cache" = "0" } { "hugetlbfs_mount" = "/dev/hugepages" } +{ "bridge_helper" = "/usr/libexec/qemu-bridge-helper" } { "clear_emulator_capabilities" = "1" } { "set_process_name" = "1" } { "max_processes" = "0" }
ACK. (But I'd like at least one other ACK from someone else due to the fact that this is polluting the config namespace with something we would like to eventually eliminate.)

On 04/18/2013 11:35 AM, Laine Stump wrote:
+# Path to the setuid helper for creating tap devices. This executable +# is used to create <source type='bridge'> interfaces when libvirtd is +# running unprivileged. libvirt invokes the helper directly, instead +# of using "-netdev bridge", for security reasons. +#bridge_helper = "/usr/libexec/qemu-bridge-helper" + +
Are we sure we want to allow this to be configured? That could lead to some "interesting" troubleshooting incidents :-)
About the only time it would be configured is if qemu is installed in an alternate location.
On the other hand, I guess the path to qemu itself is right there in the domain config file, so how much worse could this be...
Yeah, sometimes we've got to just trust the user to not be insane.
ACK. (But I'd like at least one other ACK from someone else due to the fact that this is polluting the config namespace with something we would like to eventually eliminate.)
Even if we add a way for libvirt to get the tap device without depending on qemu's helper program, we'll have to leave the config item present (so we don't reject an older .conf file as invalid), but we can then ignore the entry at that point. I can live with this change going in, so I agree with your ACK, and have pushed it. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Thu, Apr 18, 2013 at 01:35:51PM -0400, Laine Stump wrote:
On 03/25/2013 10:25 AM, Paolo Bonzini wrote:
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- src/qemu/libvirtd_qemu.aug | 1 + src/qemu/qemu.conf | 8 ++++++++ src/qemu/qemu_conf.c | 3 +++ src/qemu/qemu_conf.h | 1 + src/qemu/test_libvirtd_qemu.aug.in | 1 + 5 files changed, 14 insertions(+)
diff --git a/src/qemu/libvirtd_qemu.aug b/src/qemu/libvirtd_qemu.aug index 91f5f77..61740a9 100644 --- a/src/qemu/libvirtd_qemu.aug +++ b/src/qemu/libvirtd_qemu.aug @@ -60,6 +60,7 @@ module Libvirtd_qemu =
let process_entry = str_entry "hugetlbfs_mount" | bool_entry "clear_emulator_capabilities" + | str_entry "bridge_helper" | bool_entry "set_process_name" | int_entry "max_processes" | int_entry "max_files" diff --git a/src/qemu/qemu.conf b/src/qemu/qemu.conf index dd853c8..87bdf70 100644 --- a/src/qemu/qemu.conf +++ b/src/qemu/qemu.conf @@ -301,6 +301,14 @@ #hugetlbfs_mount = "/dev/hugepages"
+# Path to the setuid helper for creating tap devices. This executable +# is used to create <source type='bridge'> interfaces when libvirtd is +# running unprivileged. libvirt invokes the helper directly, instead +# of using "-netdev bridge", for security reasons. +#bridge_helper = "/usr/libexec/qemu-bridge-helper" + +
Are we sure we want to allow this to be configured? That could lead to some "interesting" troubleshooting incidents :-)
On the other hand, I guess the path to qemu itself is right there in the domain config file, so how much worse could this be...
+ # If clear_emulator_capabilities is enabled, libvirt will drop all # privileged capabilities of the QEmu/KVM emulator. This is enabled by # default. diff --git a/src/qemu/qemu_conf.c b/src/qemu/qemu_conf.c index c2e2e10..f648fc3 100644 --- a/src/qemu/qemu_conf.c +++ b/src/qemu/qemu_conf.c @@ -248,6 +248,7 @@ virQEMUDriverConfigPtr virQEMUDriverConfigNew(bool privileged) } } #endif + cfg->bridgeHelperName = strdup("/usr/libexec/qemu-bridge-helper");
cfg->clearEmulatorCapabilities = true;
@@ -297,6 +298,7 @@ static void virQEMUDriverConfigDispose(void *obj)
VIR_FREE(cfg->hugetlbfsMount); VIR_FREE(cfg->hugepagePath); + VIR_FREE(cfg->bridgeHelperName);
VIR_FREE(cfg->saveImageFormat); VIR_FREE(cfg->dumpImageFormat); @@ -509,6 +511,7 @@ int virQEMUDriverConfigLoadFile(virQEMUDriverConfigPtr cfg, GET_VALUE_BOOL("auto_start_bypass_cache", cfg->autoStartBypassCache);
GET_VALUE_STR("hugetlbfs_mount", cfg->hugetlbfsMount); + GET_VALUE_STR("bridge_helper", cfg->bridgeHelperName);
GET_VALUE_BOOL("mac_filter", cfg->macFilter);
diff --git a/src/qemu/qemu_conf.h b/src/qemu/qemu_conf.h index c5ddaad..666ac33 100644 --- a/src/qemu/qemu_conf.h +++ b/src/qemu/qemu_conf.h @@ -117,6 +117,7 @@ struct _virQEMUDriverConfig {
char *hugetlbfsMount; char *hugepagePath; + char *bridgeHelperName;
bool macFilter;
diff --git a/src/qemu/test_libvirtd_qemu.aug.in b/src/qemu/test_libvirtd_qemu.aug.in index 2892204..0aec997 100644 --- a/src/qemu/test_libvirtd_qemu.aug.in +++ b/src/qemu/test_libvirtd_qemu.aug.in @@ -49,6 +49,7 @@ module Test_libvirtd_qemu = { "auto_dump_bypass_cache" = "0" } { "auto_start_bypass_cache" = "0" } { "hugetlbfs_mount" = "/dev/hugepages" } +{ "bridge_helper" = "/usr/libexec/qemu-bridge-helper" } { "clear_emulator_capabilities" = "1" } { "set_process_name" = "1" } { "max_processes" = "0" }
ACK. (But I'd like at least one other ACK from someone else due to the fact that this is polluting the config namespace with something we would like to eventually eliminate.)
We have to support this really, since we need to cope with people installing into non-/usr locations ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

This will be used on a tap file descriptor returned by the bridge helper to populate the <target> element, because the helper does not provide the interface name. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- src/libvirt_private.syms | 1 + src/util/virnetdevtap.c | 33 +++++++++++++++++++++++++++++++++ src/util/virnetdevtap.h | 3 +++ 3 files changed, 37 insertions(+) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index f241ec4..06085ba 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1515,6 +1515,7 @@ virNetDevOpenvswitchSetMigrateData; virNetDevTapCreate; virNetDevTapCreateInBridgePort; virNetDevTapDelete; +virNetDevTapGetName; # util/virnetdevveth.h diff --git a/src/util/virnetdevtap.c b/src/util/virnetdevtap.c index a884de1..e9fddf1 100644 --- a/src/util/virnetdevtap.c +++ b/src/util/virnetdevtap.c @@ -44,6 +44,39 @@ #define VIR_FROM_THIS VIR_FROM_NONE /** + * virNetDevTapGetName: + * @tapfd: a tun/tap file descriptor + * @ifname: a pointer that will receive the interface name + * + * Retrieve the interface name given a file descriptor for a tun/tap + * interface. + * + * Returns 0 if the interface name is successfully queried, -1 otherwise + */ +int +virNetDevTapGetName(int tapfd, char **ifname) +{ +#ifdef TUNGETIFF + struct ifreq ifr; + + /* The kernel will always return -1 at this point. + * If TUNGETIFF is not implemented then errno == EBADFD. + */ + if (ioctl(tapfd, TUNGETIFF, &ifr) < 0) { + virReportSystemError(errno, "%s", + _("Unable to query tap interface name")); + return -1; + } + + *ifname = strdup(ifr.ifr_name); + return 0; +#else + return -1; +#endif +} + + +/** * virNetDevProbeVnetHdr: * @tapfd: a tun/tap file descriptor * diff --git a/src/util/virnetdevtap.h b/src/util/virnetdevtap.h index 980db61..6bfc80c 100644 --- a/src/util/virnetdevtap.h +++ b/src/util/virnetdevtap.h @@ -35,6 +35,9 @@ int virNetDevTapCreate(char **ifname, int virNetDevTapDelete(const char *ifname) ATTRIBUTE_NONNULL(1) ATTRIBUTE_RETURN_CHECK; +int virNetDevTapGetName(int tapfd, char **ifname) + ATTRIBUTE_NONNULL(2) ATTRIBUTE_RETURN_CHECK; + typedef enum { VIR_NETDEV_TAP_CREATE_NONE = 0, /* Bring the interface up */ -- 1.8.1.4

On 03/25/2013 10:25 AM, Paolo Bonzini wrote:
This will be used on a tap file descriptor returned by the bridge helper to populate the <target> element, because the helper does not provide the interface name.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- src/libvirt_private.syms | 1 + src/util/virnetdevtap.c | 33 +++++++++++++++++++++++++++++++++ src/util/virnetdevtap.h | 3 +++ 3 files changed, 37 insertions(+)
diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index f241ec4..06085ba 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1515,6 +1515,7 @@ virNetDevOpenvswitchSetMigrateData; virNetDevTapCreate; virNetDevTapCreateInBridgePort; virNetDevTapDelete; +virNetDevTapGetName;
# util/virnetdevveth.h diff --git a/src/util/virnetdevtap.c b/src/util/virnetdevtap.c index a884de1..e9fddf1 100644 --- a/src/util/virnetdevtap.c +++ b/src/util/virnetdevtap.c @@ -44,6 +44,39 @@ #define VIR_FROM_THIS VIR_FROM_NONE
/** + * virNetDevTapGetName: + * @tapfd: a tun/tap file descriptor + * @ifname: a pointer that will receive the interface name + * + * Retrieve the interface name given a file descriptor for a tun/tap + * interface. + * + * Returns 0 if the interface name is successfully queried, -1 otherwise + */ +int +virNetDevTapGetName(int tapfd, char **ifname) +{ +#ifdef TUNGETIFF + struct ifreq ifr; + + /* The kernel will always return -1 at this point. + * If TUNGETIFF is not implemented then errno == EBADFD. + */ + if (ioctl(tapfd, TUNGETIFF, &ifr) < 0) { + virReportSystemError(errno, "%s", + _("Unable to query tap interface name")); + return -1; + } + + *ifname = strdup(ifr.ifr_name);
You need to check for OOM here. (actually there are patches pending somewhere to replace strdup() with a VIR_STRDUP() or something like that which does the error reporting for you, but you'll still need to return -1 if it fails).
+ return 0; +#else + return -1; +#endif +} + + +/** * virNetDevProbeVnetHdr: * @tapfd: a tun/tap file descriptor * diff --git a/src/util/virnetdevtap.h b/src/util/virnetdevtap.h index 980db61..6bfc80c 100644 --- a/src/util/virnetdevtap.h +++ b/src/util/virnetdevtap.h @@ -35,6 +35,9 @@ int virNetDevTapCreate(char **ifname, int virNetDevTapDelete(const char *ifname) ATTRIBUTE_NONNULL(1) ATTRIBUTE_RETURN_CHECK;
+int virNetDevTapGetName(int tapfd, char **ifname) + ATTRIBUTE_NONNULL(2) ATTRIBUTE_RETURN_CHECK; + typedef enum { VIR_NETDEV_TAP_CREATE_NONE = 0, /* Bring the interface up */

<source type='bridge'> uses a helper application to do the necessary TUN/TAP setup to use an existing network bridge, thus letting unprivileged users use TUN/TAP interfaces. However, libvirt should be preventing QEMU from running any setuid programs at all, which would include this helper program. From a security POV, any setuid helper needs to be run by libvirtd itself, not QEMU. This is what this patch does. libvirt now invokes the setuid helper, gets the TAP fd and then passes it to QEMU in the normal manner. The path to the helper is specified in qemu.conf. As a small advantage, this adds a <target dev='tap0'/> element to the XML of an active domain using <interface type='bridge'>. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- src/qemu/qemu_command.c | 133 +++++++++++++++++++++++++++++++++++------------- src/qemu/qemu_command.h | 1 - src/qemu/qemu_hotplug.c | 25 +++------ 3 files changed, 106 insertions(+), 53 deletions(-) diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index a0c278f..fa31102 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -28,12 +28,14 @@ #include "qemu_capabilities.h" #include "qemu_bridge_filter.h" #include "cpu/cpu.h" +#include "passfd.h" #include "viralloc.h" #include "virlog.h" #include "virarch.h" #include "virerror.h" #include "virutil.h" #include "virfile.h" +#include "virnetdev.h" #include "virstring.h" #include "viruuid.h" #include "c-ctype.h" @@ -46,6 +48,9 @@ #include "base64.h" #include "device_conf.h" #include "virstoragefile.h" +#if defined(__linux__) +# include <linux/capability.h> +#endif #include <sys/stat.h> #include <fcntl.h> @@ -193,6 +198,77 @@ error: } +/** + * qemuCreateInBridgePortWithHelper: + * @cfg: the configuration object in which the helper name is looked up + * @brname: the bridge name + * @ifname: the returned interface name + * @macaddr: the returned MAC address + * @tapfd: file descriptor return value for the new tap device + * @flags: OR of virNetDevTapCreateFlags: + + * VIR_NETDEV_TAP_CREATE_VNET_HDR + * - Enable IFF_VNET_HDR on the tap device + * + * This function creates a new tap device on a bridge using an external + * helper. The final name for the bridge will be stored in @ifname. + * + * Returns 0 in case of success or -1 on failure + */ +static int qemuCreateInBridgePortWithHelper(virQEMUDriverConfigPtr cfg, + const char *brname, + char **ifname, + int *tapfd, + unsigned int flags) +{ + virCommandPtr cmd; + int status; + int pair[2] = { -1, -1 }; + + if ((flags & ~VIR_NETDEV_TAP_CREATE_VNET_HDR) != VIR_NETDEV_TAP_CREATE_IFUP) + return -1; + + if (socketpair(AF_UNIX, SOCK_STREAM, 0, pair) < 0) { + virReportSystemError(errno, "%s", _("failed to create socket")); + return -1; + } + + cmd = virCommandNew(cfg->bridgeHelperName); + if (flags & VIR_NETDEV_TAP_CREATE_VNET_HDR) + virCommandAddArgFormat(cmd, "--use-vnet"); + virCommandAddArgFormat(cmd, "--br=%s", brname); + virCommandAddArgFormat(cmd, "--fd=%d", pair[1]); + virCommandTransferFD(cmd, pair[1]); + virCommandClearCaps(cmd); +#ifdef CAP_NET_ADMIN + virCommandAllowCap(cmd, CAP_NET_ADMIN); +#endif + if (virCommandRunAsync(cmd, NULL) < 0) { + *tapfd = -1; + goto out; + } + + do { + *tapfd = recvfd(pair[0], 0); + } while (*tapfd < 0 && errno == EINTR); + if (*tapfd < 0) { + virReportSystemError(errno, "%s", + _("failed to retrieve file descriptor for interface")); + goto out; + } + + if (virNetDevTapGetName(*tapfd, ifname) < 0 || + virCommandWait(cmd, &status) < 0) { + VIR_FORCE_CLOSE(*tapfd); + *tapfd = -1; + } + +out: + virCommandFree(cmd); + VIR_FORCE_CLOSE(pair[0]); + return *tapfd < 0 ? -1 : 0; +} + int qemuNetworkIfaceConnect(virDomainDefPtr def, virConnectPtr conn, @@ -270,11 +346,17 @@ qemuNetworkIfaceConnect(virDomainDefPtr def, tap_create_flags |= VIR_NETDEV_TAP_CREATE_VNET_HDR; } - err = virNetDevTapCreateInBridgePort(brname, &net->ifname, &net->mac, - def->uuid, &tapfd, - virDomainNetGetActualVirtPortProfile(net), - virDomainNetGetActualVlan(net), - tap_create_flags); + if (cfg->privileged) + err = virNetDevTapCreateInBridgePort(brname, &net->ifname, &net->mac, + def->uuid, &tapfd, + virDomainNetGetActualVirtPortProfile(net), + virDomainNetGetActualVlan(net), + tap_create_flags); + else + err = qemuCreateInBridgePortWithHelper(cfg, brname, + &net->ifname, + &tapfd, tap_create_flags); + virDomainAuditNetDevice(def, net, "/dev/net/tun", tapfd >= 0); if (err < 0) { if (template_ifname) @@ -3746,7 +3828,6 @@ error: char * qemuBuildHostNetStr(virDomainNetDefPtr net, virQEMUDriverPtr driver, - virQEMUCapsPtr qemuCaps, char type_sep, int vlan, const char *tapfd, @@ -3755,7 +3836,6 @@ qemuBuildHostNetStr(virDomainNetDefPtr net, bool is_tap = false; virBuffer buf = VIR_BUFFER_INITIALIZER; enum virDomainNetType netType = virDomainNetGetActualType(net); - const char *brname = NULL; virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); if (net->script && netType != VIR_DOMAIN_NET_TYPE_ETHERNET) { @@ -3773,14 +3853,6 @@ qemuBuildHostNetStr(virDomainNetDefPtr net, * through, -net tap,fd */ case VIR_DOMAIN_NET_TYPE_BRIDGE: - if (!cfg->privileged && - virQEMUCapsGet(qemuCaps, QEMU_CAPS_NETDEV_BRIDGE)) { - brname = virDomainNetGetActualBridgeName(net); - virBufferAsprintf(&buf, "bridge%cbr=%s", type_sep, brname); - type_sep = ','; - is_tap = true; - break; - } case VIR_DOMAIN_NET_TYPE_NETWORK: case VIR_DOMAIN_NET_TYPE_DIRECT: virBufferAsprintf(&buf, "tap%cfd=%s", type_sep, tapfd); @@ -6681,26 +6753,17 @@ qemuBuildCommandLine(virConnectPtr conn, if (actualType == VIR_DOMAIN_NET_TYPE_NETWORK || actualType == VIR_DOMAIN_NET_TYPE_BRIDGE) { - /* - * If type='bridge' then we attempt to allocate the tap fd here only if - * running under a privilged user or -netdev bridge option is not - * supported. - */ - if (actualType == VIR_DOMAIN_NET_TYPE_NETWORK || - cfg->privileged || - (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_NETDEV_BRIDGE))) { - int tapfd = qemuNetworkIfaceConnect(def, conn, driver, net, - qemuCaps); - if (tapfd < 0) - goto error; + int tapfd = qemuNetworkIfaceConnect(def, conn, driver, net, + qemuCaps); + if (tapfd < 0) + goto error; - last_good_net = i; - virCommandTransferFD(cmd, tapfd); + last_good_net = i; + virCommandTransferFD(cmd, tapfd); - if (snprintf(tapfd_name, sizeof(tapfd_name), "%d", - tapfd) >= sizeof(tapfd_name)) - goto no_memory; - } + if (snprintf(tapfd_name, sizeof(tapfd_name), "%d", + tapfd) >= sizeof(tapfd_name)) + goto no_memory; } else if (actualType == VIR_DOMAIN_NET_TYPE_DIRECT) { int tapfd = qemuPhysIfaceConnect(def, driver, net, qemuCaps, vmop); @@ -6744,7 +6807,7 @@ qemuBuildCommandLine(virConnectPtr conn, if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_NETDEV) && virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE)) { virCommandAddArg(cmd, "-netdev"); - if (!(host = qemuBuildHostNetStr(net, driver, qemuCaps, + if (!(host = qemuBuildHostNetStr(net, driver, ',', vlan, tapfd_name, vhostfd_name))) goto error; @@ -6768,7 +6831,7 @@ qemuBuildCommandLine(virConnectPtr conn, if (!(virQEMUCapsGet(qemuCaps, QEMU_CAPS_NETDEV) && virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE))) { virCommandAddArg(cmd, "-net"); - if (!(host = qemuBuildHostNetStr(net, driver, qemuCaps, + if (!(host = qemuBuildHostNetStr(net, driver, ',', vlan, tapfd_name, vhostfd_name))) goto error; diff --git a/src/qemu/qemu_command.h b/src/qemu/qemu_command.h index 17687f4..267a96e 100644 --- a/src/qemu/qemu_command.h +++ b/src/qemu/qemu_command.h @@ -71,7 +71,6 @@ qemuBuildChrDeviceStr (virDomainChrDefPtr serial, /* With vlan == -1, use netdev syntax, else old hostnet */ char * qemuBuildHostNetStr(virDomainNetDefPtr net, virQEMUDriverPtr driver, - virQEMUCapsPtr qemuCaps, char type_sep, int vlan, const char *tapfd, diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index de9edd4..6891efd 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -729,21 +729,12 @@ int qemuDomainAttachNetDevice(virConnectPtr conn, if (actualType == VIR_DOMAIN_NET_TYPE_BRIDGE || actualType == VIR_DOMAIN_NET_TYPE_NETWORK) { - /* - * If type=bridge then we attempt to allocate the tap fd here only if - * running under a privilged user or -netdev bridge option is not - * supported. - */ - if (actualType == VIR_DOMAIN_NET_TYPE_NETWORK || - cfg->privileged || - (!virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_NETDEV_BRIDGE))) { - if ((tapfd = qemuNetworkIfaceConnect(vm->def, conn, driver, net, - priv->qemuCaps)) < 0) - goto cleanup; - iface_connected = true; - if (qemuOpenVhostNet(vm->def, net, priv->qemuCaps, &vhostfd) < 0) - goto cleanup; - } + if ((tapfd = qemuNetworkIfaceConnect(vm->def, conn, driver, net, + priv->qemuCaps)) < 0) + goto cleanup; + iface_connected = true; + if (qemuOpenVhostNet(vm->def, net, priv->qemuCaps, &vhostfd) < 0) + goto cleanup; } else if (actualType == VIR_DOMAIN_NET_TYPE_DIRECT) { if ((tapfd = qemuPhysIfaceConnect(vm->def, driver, net, priv->qemuCaps, @@ -803,12 +794,12 @@ int qemuDomainAttachNetDevice(virConnectPtr conn, if (virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_NETDEV) && virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_DEVICE)) { - if (!(netstr = qemuBuildHostNetStr(net, driver, priv->qemuCaps, + if (!(netstr = qemuBuildHostNetStr(net, driver, ',', -1, tapfd_name, vhostfd_name))) goto cleanup; } else { - if (!(netstr = qemuBuildHostNetStr(net, driver, priv->qemuCaps, + if (!(netstr = qemuBuildHostNetStr(net, driver, ' ', vlan, tapfd_name, vhostfd_name))) goto cleanup; -- 1.8.1.4

On 03/25/2013 10:25 AM, Paolo Bonzini wrote:
<source type='bridge'> uses a helper application to do the necessary TUN/TAP setup to use an existing network bridge, thus letting unprivileged users use TUN/TAP interfaces.
However, libvirt should be preventing QEMU from running any setuid programs at all, which would include this helper program. From a security POV, any setuid helper needs to be run by libvirtd itself, not QEMU.
This is what this patch does. libvirt now invokes the setuid helper, gets the TAP fd and then passes it to QEMU in the normal manner. The path to the helper is specified in qemu.conf.
As a small advantage, this adds a <target dev='tap0'/> element to the XML of an active domain using <interface type='bridge'>.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- src/qemu/qemu_command.c | 133 +++++++++++++++++++++++++++++++++++------------- src/qemu/qemu_command.h | 1 - src/qemu/qemu_hotplug.c | 25 +++------ 3 files changed, 106 insertions(+), 53 deletions(-)
diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index a0c278f..fa31102 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -28,12 +28,14 @@ #include "qemu_capabilities.h" #include "qemu_bridge_filter.h" #include "cpu/cpu.h" +#include "passfd.h" #include "viralloc.h" #include "virlog.h" #include "virarch.h" #include "virerror.h" #include "virutil.h" #include "virfile.h" +#include "virnetdev.h" #include "virstring.h" #include "viruuid.h" #include "c-ctype.h" @@ -46,6 +48,9 @@ #include "base64.h" #include "device_conf.h" #include "virstoragefile.h" +#if defined(__linux__) +# include <linux/capability.h> +#endif
#include <sys/stat.h> #include <fcntl.h> @@ -193,6 +198,77 @@ error: }
+/** + * qemuCreateInBridgePortWithHelper: + * @cfg: the configuration object in which the helper name is looked up + * @brname: the bridge name + * @ifname: the returned interface name + * @macaddr: the returned MAC address + * @tapfd: file descriptor return value for the new tap device + * @flags: OR of virNetDevTapCreateFlags: + + * VIR_NETDEV_TAP_CREATE_VNET_HDR + * - Enable IFF_VNET_HDR on the tap device + * + * This function creates a new tap device on a bridge using an external + * helper. The final name for the bridge will be stored in @ifname. + * + * Returns 0 in case of success or -1 on failure + */ +static int qemuCreateInBridgePortWithHelper(virQEMUDriverConfigPtr cfg, + const char *brname, + char **ifname, + int *tapfd, + unsigned int flags) +{ + virCommandPtr cmd; + int status; + int pair[2] = { -1, -1 }; + + if ((flags & ~VIR_NETDEV_TAP_CREATE_VNET_HDR) != VIR_NETDEV_TAP_CREATE_IFUP) + return -1; + + if (socketpair(AF_UNIX, SOCK_STREAM, 0, pair) < 0) { + virReportSystemError(errno, "%s", _("failed to create socket")); + return -1; + } + + cmd = virCommandNew(cfg->bridgeHelperName); + if (flags & VIR_NETDEV_TAP_CREATE_VNET_HDR) + virCommandAddArgFormat(cmd, "--use-vnet"); + virCommandAddArgFormat(cmd, "--br=%s", brname); + virCommandAddArgFormat(cmd, "--fd=%d", pair[1]); + virCommandTransferFD(cmd, pair[1]); + virCommandClearCaps(cmd); +#ifdef CAP_NET_ADMIN + virCommandAllowCap(cmd, CAP_NET_ADMIN);
I thought you had said that attempts to set caps would fail because CAP_SETPCAP was cleared for the non-privileged libvirtd.
+#endif + if (virCommandRunAsync(cmd, NULL) < 0) { + *tapfd = -1; + goto out; + } + + do { + *tapfd = recvfd(pair[0], 0); + } while (*tapfd < 0 && errno == EINTR); + if (*tapfd < 0) { + virReportSystemError(errno, "%s", + _("failed to retrieve file descriptor for interface")); + goto out; + } + + if (virNetDevTapGetName(*tapfd, ifname) < 0 || + virCommandWait(cmd, &status) < 0) { + VIR_FORCE_CLOSE(*tapfd); + *tapfd = -1; + } + +out:
We prefer to name these labels "cleanup" rather that "out" (although there are some occurrences of "out" still in the code).
+ virCommandFree(cmd); + VIR_FORCE_CLOSE(pair[0]); + return *tapfd < 0 ? -1 : 0; +} + int qemuNetworkIfaceConnect(virDomainDefPtr def, virConnectPtr conn, @@ -270,11 +346,17 @@ qemuNetworkIfaceConnect(virDomainDefPtr def, tap_create_flags |= VIR_NETDEV_TAP_CREATE_VNET_HDR; }
- err = virNetDevTapCreateInBridgePort(brname, &net->ifname, &net->mac, - def->uuid, &tapfd, - virDomainNetGetActualVirtPortProfile(net), - virDomainNetGetActualVlan(net), - tap_create_flags); + if (cfg->privileged) + err = virNetDevTapCreateInBridgePort(brname, &net->ifname, &net->mac, + def->uuid, &tapfd, + virDomainNetGetActualVirtPortProfile(net), + virDomainNetGetActualVlan(net), + tap_create_flags); + else + err = qemuCreateInBridgePortWithHelper(cfg, brname, + &net->ifname, + &tapfd, tap_create_flags); + virDomainAuditNetDevice(def, net, "/dev/net/tun", tapfd >= 0); if (err < 0) { if (template_ifname) @@ -3746,7 +3828,6 @@ error: char * qemuBuildHostNetStr(virDomainNetDefPtr net, virQEMUDriverPtr driver, - virQEMUCapsPtr qemuCaps,
qemuCaps might again become useful for this function in the future, so you may want to leave it here (marked as ATTRIBUTE_UNUSED) to reduce code churn.
char type_sep, int vlan, const char *tapfd, @@ -3755,7 +3836,6 @@ qemuBuildHostNetStr(virDomainNetDefPtr net, bool is_tap = false; virBuffer buf = VIR_BUFFER_INITIALIZER; enum virDomainNetType netType = virDomainNetGetActualType(net); - const char *brname = NULL; virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver);
if (net->script && netType != VIR_DOMAIN_NET_TYPE_ETHERNET) { @@ -3773,14 +3853,6 @@ qemuBuildHostNetStr(virDomainNetDefPtr net, * through, -net tap,fd */ case VIR_DOMAIN_NET_TYPE_BRIDGE: - if (!cfg->privileged && - virQEMUCapsGet(qemuCaps, QEMU_CAPS_NETDEV_BRIDGE)) { - brname = virDomainNetGetActualBridgeName(net); - virBufferAsprintf(&buf, "bridge%cbr=%s", type_sep, brname); - type_sep = ','; - is_tap = true; - break; - } case VIR_DOMAIN_NET_TYPE_NETWORK: case VIR_DOMAIN_NET_TYPE_DIRECT: virBufferAsprintf(&buf, "tap%cfd=%s", type_sep, tapfd); @@ -6681,26 +6753,17 @@ qemuBuildCommandLine(virConnectPtr conn,
if (actualType == VIR_DOMAIN_NET_TYPE_NETWORK || actualType == VIR_DOMAIN_NET_TYPE_BRIDGE) { - /* - * If type='bridge' then we attempt to allocate the tap fd here only if - * running under a privilged user or -netdev bridge option is not - * supported. - */ - if (actualType == VIR_DOMAIN_NET_TYPE_NETWORK || - cfg->privileged || - (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_NETDEV_BRIDGE))) { - int tapfd = qemuNetworkIfaceConnect(def, conn, driver, net, - qemuCaps); - if (tapfd < 0) - goto error; + int tapfd = qemuNetworkIfaceConnect(def, conn, driver, net, + qemuCaps); + if (tapfd < 0) + goto error;
- last_good_net = i; - virCommandTransferFD(cmd, tapfd); + last_good_net = i; + virCommandTransferFD(cmd, tapfd);
- if (snprintf(tapfd_name, sizeof(tapfd_name), "%d", - tapfd) >= sizeof(tapfd_name)) - goto no_memory; - } + if (snprintf(tapfd_name, sizeof(tapfd_name), "%d", + tapfd) >= sizeof(tapfd_name)) + goto no_memory; } else if (actualType == VIR_DOMAIN_NET_TYPE_DIRECT) { int tapfd = qemuPhysIfaceConnect(def, driver, net, qemuCaps, vmop); @@ -6744,7 +6807,7 @@ qemuBuildCommandLine(virConnectPtr conn, if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_NETDEV) && virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE)) { virCommandAddArg(cmd, "-netdev"); - if (!(host = qemuBuildHostNetStr(net, driver, qemuCaps, + if (!(host = qemuBuildHostNetStr(net, driver, ',', vlan, tapfd_name, vhostfd_name))) goto error; @@ -6768,7 +6831,7 @@ qemuBuildCommandLine(virConnectPtr conn, if (!(virQEMUCapsGet(qemuCaps, QEMU_CAPS_NETDEV) && virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE))) { virCommandAddArg(cmd, "-net"); - if (!(host = qemuBuildHostNetStr(net, driver, qemuCaps, + if (!(host = qemuBuildHostNetStr(net, driver, ',', vlan, tapfd_name, vhostfd_name))) goto error; diff --git a/src/qemu/qemu_command.h b/src/qemu/qemu_command.h index 17687f4..267a96e 100644 --- a/src/qemu/qemu_command.h +++ b/src/qemu/qemu_command.h @@ -71,7 +71,6 @@ qemuBuildChrDeviceStr (virDomainChrDefPtr serial, /* With vlan == -1, use netdev syntax, else old hostnet */ char * qemuBuildHostNetStr(virDomainNetDefPtr net, virQEMUDriverPtr driver, - virQEMUCapsPtr qemuCaps, char type_sep, int vlan, const char *tapfd, diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index de9edd4..6891efd 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -729,21 +729,12 @@ int qemuDomainAttachNetDevice(virConnectPtr conn,
if (actualType == VIR_DOMAIN_NET_TYPE_BRIDGE || actualType == VIR_DOMAIN_NET_TYPE_NETWORK) { - /* - * If type=bridge then we attempt to allocate the tap fd here only if - * running under a privilged user or -netdev bridge option is not - * supported. - */ - if (actualType == VIR_DOMAIN_NET_TYPE_NETWORK || - cfg->privileged || - (!virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_NETDEV_BRIDGE))) { - if ((tapfd = qemuNetworkIfaceConnect(vm->def, conn, driver, net, - priv->qemuCaps)) < 0) - goto cleanup; - iface_connected = true; - if (qemuOpenVhostNet(vm->def, net, priv->qemuCaps, &vhostfd) < 0) - goto cleanup; - } + if ((tapfd = qemuNetworkIfaceConnect(vm->def, conn, driver, net, + priv->qemuCaps)) < 0) + goto cleanup; + iface_connected = true; + if (qemuOpenVhostNet(vm->def, net, priv->qemuCaps, &vhostfd) < 0) + goto cleanup; } else if (actualType == VIR_DOMAIN_NET_TYPE_DIRECT) { if ((tapfd = qemuPhysIfaceConnect(vm->def, driver, net, priv->qemuCaps, @@ -803,12 +794,12 @@ int qemuDomainAttachNetDevice(virConnectPtr conn,
if (virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_NETDEV) && virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_DEVICE)) { - if (!(netstr = qemuBuildHostNetStr(net, driver, priv->qemuCaps, + if (!(netstr = qemuBuildHostNetStr(net, driver, ',', -1, tapfd_name, vhostfd_name))) goto cleanup; } else { - if (!(netstr = qemuBuildHostNetStr(net, driver, priv->qemuCaps, + if (!(netstr = qemuBuildHostNetStr(net, driver, ' ', vlan, tapfd_name, vhostfd_name))) goto cleanup;
I still don't like using qemu-bridge-helper, but this is better than the alternative of having qemu call it (although, due to the way that process capabilities works, we are unable to prevent a rogue qemu started by unprivileged libvirtd from calling it :-( ACK to this patch (I think I would prefer you left the qemuCaps arg in, but others may disagree with me.)

On 04/18/2013 11:32 AM, Laine Stump wrote:
On 03/25/2013 10:25 AM, Paolo Bonzini wrote:
<source type='bridge'> uses a helper application to do the necessary TUN/TAP setup to use an existing network bridge, thus letting unprivileged users use TUN/TAP interfaces.
@@ -3746,7 +3828,6 @@ error: char * qemuBuildHostNetStr(virDomainNetDefPtr net, virQEMUDriverPtr driver, - virQEMUCapsPtr qemuCaps,
qemuCaps might again become useful for this function in the future, so you may want to leave it here (marked as ATTRIBUTE_UNUSED) to reduce code churn.
For that matter, do we need a qemu capability bit that says whether the bridge helper even exists, or do you give a nice error message to someone using qemu:///session on a machine with too-old qemu without the need for a capability bit??
I still don't like using qemu-bridge-helper, but this is better than the alternative of having qemu call it (although, due to the way that process capabilities works, we are unable to prevent a rogue qemu started by unprivileged libvirtd from calling it :-(
ACK to this patch (I think I would prefer you left the qemuCaps arg in, but others may disagree with me.)
I haven't pushed 4 or 5 (4 needed an OOM fix, and there's the question of whether a capability bit is still useful). I don't have a strong preference whether to leave qemuCaps in; we have version control to inspect if we need to figure out how to add it back in later. I prefer to limit ATTRIBUTE_UNUSED markers to functions that have a required signature (generally, callbacks to be called from some other generic code site); but this doesn't feel like we have any reason to be locked into leaving qemuCaps in the interface. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

[snip]
I still don't like using qemu-bridge-helper, but this is better than the alternative of having qemu call it (although, due to the way that process capabilities works, we are unable to prevent a rogue qemu started by unprivileged libvirtd from calling it :-(
Maybe we can introduce a tighter seccomp sandbox environment that doesn't allow the QEMU process to call exec(), open(), socket() (and anything else?) on top of the syscalls that are already not included in the -sandbox whitelist. This would require fd's to be passed from libvirt. Eduardo's going to work on adding functionality in this area in case you have any suggestions. -- Regards, Corey Bryant
ACK to this patch (I think I would prefer you left the qemuCaps arg in, but others may disagree with me.)
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On Fri, Apr 19, 2013 at 09:47:05AM -0400, Corey Bryant wrote:
[snip]
I still don't like using qemu-bridge-helper, but this is better than the alternative of having qemu call it (although, due to the way that process capabilities works, we are unable to prevent a rogue qemu started by unprivileged libvirtd from calling it :-(
Maybe we can introduce a tighter seccomp sandbox environment that doesn't allow the QEMU process to call exec(), open(), socket() (and anything else?) on top of the syscalls that are already not included in the -sandbox whitelist. This would require fd's to be passed from libvirt. Eduardo's going to work on adding functionality in this area in case you have any suggestions.
I'd certainly like to see us have a profile that prevents execve() in the near future. Already today there's no reason why a QEMU managed by libvirt should need to exec anything. Eventually we'll get to the place where we can consider blocking open/socket too, but I fear that's still quite a way off in the distance. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 04/19/2013 09:51 AM, Daniel P. Berrange wrote:
On Fri, Apr 19, 2013 at 09:47:05AM -0400, Corey Bryant wrote:
[snip]
I still don't like using qemu-bridge-helper, but this is better than the alternative of having qemu call it (although, due to the way that process capabilities works, we are unable to prevent a rogue qemu started by unprivileged libvirtd from calling it :-(
Maybe we can introduce a tighter seccomp sandbox environment that doesn't allow the QEMU process to call exec(), open(), socket() (and anything else?) on top of the syscalls that are already not included in the -sandbox whitelist. This would require fd's to be passed from libvirt. Eduardo's going to work on adding functionality in this area in case you have any suggestions.
I'd certainly like to see us have a profile that prevents execve() in the near future. Already today there's no reason why a QEMU managed by libvirt should need to exec anything. Eventually we'll get to the place where we can consider blocking open/socket too, but I fear that's still quite a way off in the distance.
Daniel
Thanks for the input. The problem we run into next is that QEMU already calls execve() so I assume we'd need to support 2 seccomp syscall whitelists, the existing whitelist that allows execve() and another that doesn't. Perhaps we can have a -sandbox,strict that ends up being a much more locked down environment than the current default. Or perhaps we can allow libvirt to define and pass a whitelist to QEMU. -- Regards, Corey Bryant

On Fri, Apr 19, 2013 at 10:05:33AM -0400, Corey Bryant wrote:
On 04/19/2013 09:51 AM, Daniel P. Berrange wrote:
On Fri, Apr 19, 2013 at 09:47:05AM -0400, Corey Bryant wrote:
[snip]
I still don't like using qemu-bridge-helper, but this is better than the alternative of having qemu call it (although, due to the way that process capabilities works, we are unable to prevent a rogue qemu started by unprivileged libvirtd from calling it :-(
Maybe we can introduce a tighter seccomp sandbox environment that doesn't allow the QEMU process to call exec(), open(), socket() (and anything else?) on top of the syscalls that are already not included in the -sandbox whitelist. This would require fd's to be passed from libvirt. Eduardo's going to work on adding functionality in this area in case you have any suggestions.
I'd certainly like to see us have a profile that prevents execve() in the near future. Already today there's no reason why a QEMU managed by libvirt should need to exec anything. Eventually we'll get to the place where we can consider blocking open/socket too, but I fear that's still quite a way off in the distance.
Daniel
Thanks for the input. The problem we run into next is that QEMU already calls execve() so I assume we'd need to support 2 seccomp syscall whitelists, the existing whitelist that allows execve() and another that doesn't. Perhaps we can have a -sandbox,strict that ends up being a much more locked down environment than the current default. Or perhaps we can allow libvirt to define and pass a whitelist to QEMU.
I think it is inevitable that we will need to have multiple seccomp whitelist profiles, because we can't assume that everyone who wants to use seccomp with QEMU, will also be using libvirt. Minimising the number of profiles we need to maintain though is key, perhaps to just 2 or 3 profiles. One that is fairly permissive to allow all QEMU features to 'just work', and one+ that is strict requiring apps to avoid certain features in favour of FD passing. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Il 18/04/2013 19:32, Laine Stump ha scritto:
On 03/25/2013 10:25 AM, Paolo Bonzini wrote:
<source type='bridge'> uses a helper application to do the necessary TUN/TAP setup to use an existing network bridge, thus letting unprivileged users use TUN/TAP interfaces.
However, libvirt should be preventing QEMU from running any setuid programs at all, which would include this helper program. From a security POV, any setuid helper needs to be run by libvirtd itself, not QEMU.
This is what this patch does. libvirt now invokes the setuid helper, gets the TAP fd and then passes it to QEMU in the normal manner. The path to the helper is specified in qemu.conf.
As a small advantage, this adds a <target dev='tap0'/> element to the XML of an active domain using <interface type='bridge'>.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- src/qemu/qemu_command.c | 133 +++++++++++++++++++++++++++++++++++------------- src/qemu/qemu_command.h | 1 - src/qemu/qemu_hotplug.c | 25 +++------ 3 files changed, 106 insertions(+), 53 deletions(-)
diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index a0c278f..fa31102 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -28,12 +28,14 @@ #include "qemu_capabilities.h" #include "qemu_bridge_filter.h" #include "cpu/cpu.h" +#include "passfd.h" #include "viralloc.h" #include "virlog.h" #include "virarch.h" #include "virerror.h" #include "virutil.h" #include "virfile.h" +#include "virnetdev.h" #include "virstring.h" #include "viruuid.h" #include "c-ctype.h" @@ -46,6 +48,9 @@ #include "base64.h" #include "device_conf.h" #include "virstoragefile.h" +#if defined(__linux__) +# include <linux/capability.h> +#endif
#include <sys/stat.h> #include <fcntl.h> @@ -193,6 +198,77 @@ error: }
+/** + * qemuCreateInBridgePortWithHelper: + * @cfg: the configuration object in which the helper name is looked up + * @brname: the bridge name + * @ifname: the returned interface name + * @macaddr: the returned MAC address + * @tapfd: file descriptor return value for the new tap device + * @flags: OR of virNetDevTapCreateFlags: + + * VIR_NETDEV_TAP_CREATE_VNET_HDR + * - Enable IFF_VNET_HDR on the tap device + * + * This function creates a new tap device on a bridge using an external + * helper. The final name for the bridge will be stored in @ifname. + * + * Returns 0 in case of success or -1 on failure + */ +static int qemuCreateInBridgePortWithHelper(virQEMUDriverConfigPtr cfg, + const char *brname, + char **ifname, + int *tapfd, + unsigned int flags) +{ + virCommandPtr cmd; + int status; + int pair[2] = { -1, -1 }; + + if ((flags & ~VIR_NETDEV_TAP_CREATE_VNET_HDR) != VIR_NETDEV_TAP_CREATE_IFUP) + return -1; + + if (socketpair(AF_UNIX, SOCK_STREAM, 0, pair) < 0) { + virReportSystemError(errno, "%s", _("failed to create socket")); + return -1; + } + + cmd = virCommandNew(cfg->bridgeHelperName); + if (flags & VIR_NETDEV_TAP_CREATE_VNET_HDR) + virCommandAddArgFormat(cmd, "--use-vnet"); + virCommandAddArgFormat(cmd, "--br=%s", brname); + virCommandAddArgFormat(cmd, "--fd=%d", pair[1]); + virCommandTransferFD(cmd, pair[1]); + virCommandClearCaps(cmd); +#ifdef CAP_NET_ADMIN + virCommandAllowCap(cmd, CAP_NET_ADMIN);
I thought you had said that attempts to set caps would fail because CAP_SETPCAP was cleared for the non-privileged libvirtd.
I still prefer to note down the capabilities. This way running libvirt with CAP_SETPCAP as a permitted file capability (and only that one + the effective bit set) should do the right thing. I tested this by forcing this path for the system libvirt.
+#endif + if (virCommandRunAsync(cmd, NULL) < 0) { + *tapfd = -1; + goto out; + } + + do { + *tapfd = recvfd(pair[0], 0); + } while (*tapfd < 0 && errno == EINTR); + if (*tapfd < 0) { + virReportSystemError(errno, "%s", + _("failed to retrieve file descriptor for interface")); + goto out; + } + + if (virNetDevTapGetName(*tapfd, ifname) < 0 || + virCommandWait(cmd, &status) < 0) { + VIR_FORCE_CLOSE(*tapfd); + *tapfd = -1; + } + +out:
We prefer to name these labels "cleanup" rather that "out" (although there are some occurrences of "out" still in the code).
Ok. Paolo
+ virCommandFree(cmd); + VIR_FORCE_CLOSE(pair[0]); + return *tapfd < 0 ? -1 : 0; +} + int qemuNetworkIfaceConnect(virDomainDefPtr def, virConnectPtr conn, @@ -270,11 +346,17 @@ qemuNetworkIfaceConnect(virDomainDefPtr def, tap_create_flags |= VIR_NETDEV_TAP_CREATE_VNET_HDR; }
- err = virNetDevTapCreateInBridgePort(brname, &net->ifname, &net->mac, - def->uuid, &tapfd, - virDomainNetGetActualVirtPortProfile(net), - virDomainNetGetActualVlan(net), - tap_create_flags); + if (cfg->privileged) + err = virNetDevTapCreateInBridgePort(brname, &net->ifname, &net->mac, + def->uuid, &tapfd, + virDomainNetGetActualVirtPortProfile(net), + virDomainNetGetActualVlan(net), + tap_create_flags); + else + err = qemuCreateInBridgePortWithHelper(cfg, brname, + &net->ifname, + &tapfd, tap_create_flags); + virDomainAuditNetDevice(def, net, "/dev/net/tun", tapfd >= 0); if (err < 0) { if (template_ifname) @@ -3746,7 +3828,6 @@ error: char * qemuBuildHostNetStr(virDomainNetDefPtr net, virQEMUDriverPtr driver, - virQEMUCapsPtr qemuCaps,
qemuCaps might again become useful for this function in the future, so you may want to leave it here (marked as ATTRIBUTE_UNUSED) to reduce code churn.
char type_sep, int vlan, const char *tapfd, @@ -3755,7 +3836,6 @@ qemuBuildHostNetStr(virDomainNetDefPtr net, bool is_tap = false; virBuffer buf = VIR_BUFFER_INITIALIZER; enum virDomainNetType netType = virDomainNetGetActualType(net); - const char *brname = NULL; virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver);
if (net->script && netType != VIR_DOMAIN_NET_TYPE_ETHERNET) { @@ -3773,14 +3853,6 @@ qemuBuildHostNetStr(virDomainNetDefPtr net, * through, -net tap,fd */ case VIR_DOMAIN_NET_TYPE_BRIDGE: - if (!cfg->privileged && - virQEMUCapsGet(qemuCaps, QEMU_CAPS_NETDEV_BRIDGE)) { - brname = virDomainNetGetActualBridgeName(net); - virBufferAsprintf(&buf, "bridge%cbr=%s", type_sep, brname); - type_sep = ','; - is_tap = true; - break; - } case VIR_DOMAIN_NET_TYPE_NETWORK: case VIR_DOMAIN_NET_TYPE_DIRECT: virBufferAsprintf(&buf, "tap%cfd=%s", type_sep, tapfd); @@ -6681,26 +6753,17 @@ qemuBuildCommandLine(virConnectPtr conn,
if (actualType == VIR_DOMAIN_NET_TYPE_NETWORK || actualType == VIR_DOMAIN_NET_TYPE_BRIDGE) { - /* - * If type='bridge' then we attempt to allocate the tap fd here only if - * running under a privilged user or -netdev bridge option is not - * supported. - */ - if (actualType == VIR_DOMAIN_NET_TYPE_NETWORK || - cfg->privileged || - (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_NETDEV_BRIDGE))) { - int tapfd = qemuNetworkIfaceConnect(def, conn, driver, net, - qemuCaps); - if (tapfd < 0) - goto error; + int tapfd = qemuNetworkIfaceConnect(def, conn, driver, net, + qemuCaps); + if (tapfd < 0) + goto error;
- last_good_net = i; - virCommandTransferFD(cmd, tapfd); + last_good_net = i; + virCommandTransferFD(cmd, tapfd);
- if (snprintf(tapfd_name, sizeof(tapfd_name), "%d", - tapfd) >= sizeof(tapfd_name)) - goto no_memory; - } + if (snprintf(tapfd_name, sizeof(tapfd_name), "%d", + tapfd) >= sizeof(tapfd_name)) + goto no_memory; } else if (actualType == VIR_DOMAIN_NET_TYPE_DIRECT) { int tapfd = qemuPhysIfaceConnect(def, driver, net, qemuCaps, vmop); @@ -6744,7 +6807,7 @@ qemuBuildCommandLine(virConnectPtr conn, if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_NETDEV) && virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE)) { virCommandAddArg(cmd, "-netdev"); - if (!(host = qemuBuildHostNetStr(net, driver, qemuCaps, + if (!(host = qemuBuildHostNetStr(net, driver, ',', vlan, tapfd_name, vhostfd_name))) goto error; @@ -6768,7 +6831,7 @@ qemuBuildCommandLine(virConnectPtr conn, if (!(virQEMUCapsGet(qemuCaps, QEMU_CAPS_NETDEV) && virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE))) { virCommandAddArg(cmd, "-net"); - if (!(host = qemuBuildHostNetStr(net, driver, qemuCaps, + if (!(host = qemuBuildHostNetStr(net, driver, ',', vlan, tapfd_name, vhostfd_name))) goto error; diff --git a/src/qemu/qemu_command.h b/src/qemu/qemu_command.h index 17687f4..267a96e 100644 --- a/src/qemu/qemu_command.h +++ b/src/qemu/qemu_command.h @@ -71,7 +71,6 @@ qemuBuildChrDeviceStr (virDomainChrDefPtr serial, /* With vlan == -1, use netdev syntax, else old hostnet */ char * qemuBuildHostNetStr(virDomainNetDefPtr net, virQEMUDriverPtr driver, - virQEMUCapsPtr qemuCaps, char type_sep, int vlan, const char *tapfd, diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index de9edd4..6891efd 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -729,21 +729,12 @@ int qemuDomainAttachNetDevice(virConnectPtr conn,
if (actualType == VIR_DOMAIN_NET_TYPE_BRIDGE || actualType == VIR_DOMAIN_NET_TYPE_NETWORK) { - /* - * If type=bridge then we attempt to allocate the tap fd here only if - * running under a privilged user or -netdev bridge option is not - * supported. - */ - if (actualType == VIR_DOMAIN_NET_TYPE_NETWORK || - cfg->privileged || - (!virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_NETDEV_BRIDGE))) { - if ((tapfd = qemuNetworkIfaceConnect(vm->def, conn, driver, net, - priv->qemuCaps)) < 0) - goto cleanup; - iface_connected = true; - if (qemuOpenVhostNet(vm->def, net, priv->qemuCaps, &vhostfd) < 0) - goto cleanup; - } + if ((tapfd = qemuNetworkIfaceConnect(vm->def, conn, driver, net, + priv->qemuCaps)) < 0) + goto cleanup; + iface_connected = true; + if (qemuOpenVhostNet(vm->def, net, priv->qemuCaps, &vhostfd) < 0) + goto cleanup; } else if (actualType == VIR_DOMAIN_NET_TYPE_DIRECT) { if ((tapfd = qemuPhysIfaceConnect(vm->def, driver, net, priv->qemuCaps, @@ -803,12 +794,12 @@ int qemuDomainAttachNetDevice(virConnectPtr conn,
if (virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_NETDEV) && virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_DEVICE)) { - if (!(netstr = qemuBuildHostNetStr(net, driver, priv->qemuCaps, + if (!(netstr = qemuBuildHostNetStr(net, driver, ',', -1, tapfd_name, vhostfd_name))) goto cleanup; } else { - if (!(netstr = qemuBuildHostNetStr(net, driver, priv->qemuCaps, + if (!(netstr = qemuBuildHostNetStr(net, driver, ' ', vlan, tapfd_name, vhostfd_name))) goto cleanup;
I still don't like using qemu-bridge-helper, but this is better than the alternative of having qemu call it (although, due to the way that process capabilities works, we are unable to prevent a rogue qemu started by unprivileged libvirtd from calling it :-(
ACK to this patch (I think I would prefer you left the qemuCaps arg in, but others may disagree with me.)

On 03/25/2013 10:25 AM, Paolo Bonzini wrote:
The <interface type='bridge'> is working mostly because of a bad design decision in Linux. Ideally, QEMU would run with an empty capability bounding set and would not be able to do any privileged operation (not even by running a helper program). This is not the case because dropping capabilities from the bounding set requires a capability of its own, CAP_SETPCAP; thus QEMU does *not* run with an empty bounding set if invoked via qemu:///session.
Ewww. So what you're saying is that the qemu that's run from qemu:///system is more locked down (and thus "more secure") than the qemu that's run from qemu:///session? Basically this qemu can run any setuid application it likes, and there's nothing that we can do about it.

Il 28/03/2013 20:30, Laine Stump ha scritto:
The <interface type='bridge'> is working mostly because of a bad design decision in Linux. Ideally, QEMU would run with an empty capability bounding set and would not be able to do any privileged operation (not even by running a helper program). This is not the case because dropping capabilities from the bounding set requires a capability of its own, CAP_SETPCAP; thus QEMU does *not* run with an empty bounding set if invoked via qemu:///session.
Ewww. So what you're saying is that the qemu that's run from qemu:///system is more locked down (and thus "more secure") than the qemu that's run from qemu:///session? Basically this qemu can run any setuid application it likes, and there's nothing that we can do about it.
Yes. However, seccompv2 can still prevent execve to be executed by qemu. Paolo

Il 25/03/2013 15:25, Paolo Bonzini ha scritto:
The <interface type='bridge'> is working mostly because of a bad design decision in Linux. Ideally, QEMU would run with an empty capability bounding set and would not be able to do any privileged operation (not even by running a helper program). This is not the case because dropping capabilities from the bounding set requires a capability of its own, CAP_SETPCAP; thus QEMU does *not* run with an empty bounding set if invoked via qemu:///session.
This series lets libvirtd invoke the privileged helper program on its own, which is a cleaner design that would work even if the above Linux bug was not there. Also, this adds a <target dev='tap0'/> element to the XML of an active domain using <interface type='bridge'>.
libvirt now needs to know the path to the helper (patch 3), and must not set permitted/effective capabilities on children when running unprivileged (patches 1/2). Apart from this, the recvfd and virCommand APIs make the task almost trivial.
Paolo Bonzini (5): util: simplify virSetUIDGIDWithCaps util: allow using virCommandAllowCap with setuid helpers qemu_conf: add new configuration key bridge_helper virnetdevtap: add virNetDevTapGetName qemu: launch bridge helper from libvirtd
src/libvirt_private.syms | 1 + src/qemu/libvirtd_qemu.aug | 1 + src/qemu/qemu.conf | 8 +++ src/qemu/qemu_command.c | 133 +++++++++++++++++++++++++++---------- src/qemu/qemu_command.h | 1 - src/qemu/qemu_conf.c | 3 + src/qemu/qemu_conf.h | 1 + src/qemu/qemu_hotplug.c | 25 +++---- src/qemu/test_libvirtd_qemu.aug.in | 1 + src/util/virnetdevtap.c | 33 +++++++++ src/util/virnetdevtap.h | 3 + src/util/virutil.c | 36 +++++++--- 12 files changed, 183 insertions(+), 63 deletions(-)
Ping? Paolo
participants (5)
-
Corey Bryant
-
Daniel P. Berrange
-
Eric Blake
-
Laine Stump
-
Paolo Bonzini