On 14.12.2016 18:02, Daniel P. Berrange wrote:
On Mon, Dec 12, 2016 at 05:52:54PM +0100, Michal Privoznik wrote:
> Prime time. When it comes to spawning qemu process and
> relabelling all the devices it's going to touch, there's inherent
> race with other applications in the system (e.g. udev). Instead
> of trying convincing udev to not touch libvirt managed devices,
> we can create a separate mount namespace for the qemu, and mount
> our own /dev there. Of course this puts more work onto us as we
> have to maintain /dev files on each domain start and device
> hot(un-)plug. On the other hand, this enhances security also.
>
> From technical POV, on domain startup process the parent
> (libvirtd) creates:
>
> /var/lib/libvirt/qemu/$domain.dev
> /var/lib/libvirt/qemu/$domain.devpts
>
> The child (which is going to be qemu eventually) calls unshare()
> to create new mount namespace. From now on anything that child
> does is invisible to the parent. Child then mounts tmpfs on
> $domain.dev (so that it still sees original /dev from the host)
> and creates some devices (as explained in one of the previous
> patches). The devices have to be created exactly as they are in
> the host (including perms, seclabels, ACLs, ...). After that it
> moves $domain.dev mount to /dev.
>
> What's the $domain.devpts mount there for then you ask? QEMU can
> create PTYs for some chardevs. And historically we exposed the
> host ends in our domain XML allowing users to connect to them.
> Therefore we must preserve devpts mount to be shared with the
> host's one.
>
> To make this patch as small as possible, creating of devices
> configured for domain in question is implemented in next patches.
>
> Signed-off-by: Michal Privoznik <mprivozn(a)redhat.com>
> ---
>
> Diff to v2:
>
> - More /dev/* mounts are preserved
> - Fixed qemuDomainEnableNamespace return type and error reporting
>
> The whole patches can found here:
>
>
https://github.com/zippy2/libvirt/tree/qemu_container_v4
ACK to whole series from me. I've tested this and it works correctly
now, including with host assigned USB devices. There's probably some
fun edge cases hiding in there, but we need more people testing to
see that.
Thank you. I've pushed these.
Michal