On Tue, 2007-01-16 at 22:28 +0000, Daniel P. Berrange wrote:
On Mon, Jan 15, 2007 at 08:06:18PM +0000, Mark McLoughlin wrote:
Since we've disappeared down a rat-hole with the other part of
the thread,
here's an attempt to get back on-topic :-)
Indeed :-)
Since the user is privileged, another way to do without VDE is to
mirror
the Xen case almost exactly, creating one tap device per guest, instead
of Xen's netback vif devices:
Sure. There is the argument that always using VDE is nicer because it's
consistent with the non-privileged and remotely connected network
versions.
As you say, though, this way is consistent with the Xen version.
> 3. An unprivileged user does exactly the same thing as
(2).
>
> +-----------+ +-----------+
> | Guest | +----+----+ | Guest |
> | A | |userspace| | B |
> | +---+ | | network | | +---+ |
> | |NIC| | | stack | | |NIC| |
> +---+-+-+---+ +----+----+ +---+-+-+---+
> ^ +-------+ | +-------+ ^
> | | | +---+---+ | | |
> +------>+ VLAN0 +-+ VDE +-+ VLAN0 +<------+
> | | +-------+ | |
> +-------+ +-------+
>
> Notes:
>
> * Similar to (2) except there is can be no TAP device or
> bridge
> * The userspace network stack is implemented using
> slirpvde to provide a DHCP server and DNS proxy to the
> network, but also effectively a SNAT and DNAT router.
> * slirpvde implements ethernet, ip, tcp, udp, icmp, dhcp,
> tftp (etc.) in userspace. Completely crazy, but since
> the kernel apparently has no secure way to allow
> unprivileged users to leverage the kernel's network
> stack for this, then it must be done in userspace.
Is it practical to just have some kind of privileged proxy that would
merely create & configure the tap devices on behalf of the unprivileged
guests ? If we just create tap devices for any unprivileged guest, but
kept them discounted from any real network device, would that still be
a big hole ?
Okay, to avoid a userspace network stack, you need a way to securely
allow guests running as unprivileged users to use the kernel's network
stack. That implies:
1) The packets/frames have to arrive on a network interface created
by the user (e.g. a TAP or SLIP iface)
2) It should not be possible to spoof as another host or adversely
affect the host's connectivity, or any other machine on the same
network as the host
3) slirp prevents spoofing by effectively translating the source
address of any packet which leaves the virtual network, just like a
router using SNAT
4) We can do the same thing by enabling IP forwarding and having all
packets forwarded by the host go through SNAT
5) The problem with that is what to do about packets not being
forwarded by the host, but which are destined for the host itself?
SNAT in PREROUTING might do it, but that's not allowed it seems.
6) We also have to worry about whether people could e.g. screw up the
host's ARP cache
7) We also have to worry about a DOS whereby someone creates lots of
network interfaces
And note, this isn't just about worrying about nasty guests. You have
to worry about what nasty users on the host could do with a setuid
helper like this.
It's certainly got to be "possible" ... but I don't yet feel I know
what all the bases are that need to be covered, never mind how we'd
cover them.
Or can we leverage QEMU's builtin SLIRP or other non-TAP
networking modes
to construct something reasonable in userspace, without using VDE.
The general problem with any SLIRP derivative or similar it's another
network stack implementation. That makes me nervous for security,
performance, stability and portability reasons.
And as I found out, the case in point is that SLIRP currently has
buffer overflow vulnerabilities and isn't 64 bit clean.
> Virtual Networks will be implemented in libvirt. First, there
will be an
> XML description of Virtual Networks e.g.:
>
> <network id="0">
> <name>Foo</name>
> <uuid>596a5d2171f48fb2e068e2386a5c413e</uuid>
> <listen address="172.31.0.5" port="1234" />
> <connections>
> <connection address="172.31.0.6" port="4321" />
> </conections>
> <dhcp enabled="true">
> <ip address="10.0.0.1"
> netmask="255.255.255.0"
> start="10.0.0.128"
> end="10.0.0.254" />
> </dhcp>
> <forwarding enabled="true">
> <incoming default="deny">
> <allow port="123" domain="foobar"
destport="321" />
> </incoming>
> <outgoing default="allow">
> <deny port="25" />
> </outgoing>
> </forwarding>
> <network>
Got to also think how we connect guest domains to the virtual network.
Right, further on in the mail I said:
* Where is the connection between domains and networks in either
the API or the XML format? How is a domain associated with a
network? You put a bridge name in the <network> definition
and use that in the domains <interface> definition? Or you put
the network name in the interface definition and have libvirt
look up the bridge name when creating the guest?
Currently we just have something really simple like
<interface type="bridge">
<source bridge='xenbr0'/>
<mac address='00:11:22:33:44:55'/>
</interface>
I guess we've probably want to refer to the UUID of the network to map
it into the guest.
Well, the UUID isn't much good if you can't map it. So, it would
probably be the name and libvirt URI, right?
Oh, do we to define a 'network 0' to the the physical network
of the hos
machine - what if there are multiple host NICs - any conventions we
need to let us distinguish ? Maybe its best to just refer to the host
network by using IP addresses - so we can deal better which case where
a machine switches from eth0 -> eth1 (wired to wireless) but keeps the
same IP address, or some such.
Well, I think there should be a default virtual network defined
somehow. You shouldn't need to create one unless you want a second one.
But remember that under the model I'm suggesting, guests connect
*either* to a virtual network or a physical network via a "shared
physical interface".
The shared physical interface just winds up being a bridge you enslave
the guest's interface to, so the easiest answer for that is that we
stick with the way it is right now for Xen and have QEMU create a TAP
device and enslave that to the bridge in this mode.
Dunno, it does need more thought/discussion ... I find the current
<interface> stuff quite strange now - e.g. "bridge" vs.
"ethernet" types
and the bridge name is in <source> ?
Cheers,
Mark.