Re: [libvirt] migration of vnlink VMs

----- Original Message -----
From: "Laine Stump" <lstump@redhat.com> To: "Oved Ourfalli" <ovedo@redhat.com> Cc: "Ayal Baron" <abaron@redhat.com>, "Barak Azulay" <bazulay@redhat.com>, "Shahar Havivi" <shaharh@redhat.com>, "Itamar Heim" <iheim@redhat.com>, "Dan Kenigsberg" <danken@redhat.com> Sent: Thursday, April 28, 2011 10:20:35 AM Subject: Re: migration of vnlink VMs Oved,
Would it be okay to repost this message to the thread on libvir-list so that other parties can add their thoughts?
Of course. I'm sending my answer to the libvirt list.
On 04/27/2011 09:58 AM, Oved Ourfalli wrote:
Laine, hello.
We read your proposal for abstraction of guest<--> host network connection in libvirt.
You has an open issue there regarding the vepa/vnlink attributes: "3) What about the parameters in the<virtualport> element that are currently used by vepa/vnlink. Do those belong with the host, or with the guest?"
The parameters for the virtualport element should be on the guest, and not the host, because a specific interface can run multiple profiles,
Are you talking about host interface or guest interface? If you mean that multiple different profiles can be used when connecting to a particular switch - as long as there are only a few different profiles, rather than each guest having its own unique profile, then it still seems better to have the port profile live with the network definition (and just define multiple networks, one for each port profile).
The profile names can be changed regularly, so it looks like it will be better to put them in the guest level, so that the network host file won't have to be changed on all hosts once something has changed in the profiles. Also, you will have a duplication of data, writing all the profile name on all the hosts that are connected to the vn-link/vepa switch.
so it will be a mistake to define a profile to be interface specific on the host. Moreover, putting it in the guest level will enable us in the future (if supported by libvirt/qemu) to migrate a vm from a host with vepa/vnlink interfaces, to another host with a bridge, for example.
It seems to me like doing exactly the opposite would make it easier to migrate to a host that used a different kind of switching (from vepa to vnlink, or from a bridged interface to vepa, etc), since the port profile required for a particular host's network would be at the host waiting to be used.
You are right, but we would want to have the option to prevent that from happening in case we wouldn't want to allow it. We can make the ability to migrate between different network types configurable, and we would like an easy way to tell libvirt - "please allow/don't allow it".
So, in the networks at the host level you will have: <network type='direct'> <name>red-network</name> <source mode='vepa'> <pool> <interface> <name>eth0</name> ..... </interface> <interface> <name>eth4</name> ..... </interface> <interface> <name>eth18</name> ..... </interface> </pool> </source> </network>
And in the guest you will have (for vepa): <interface type='network'> <source network='red-network'/> <virtualport type="802.1Qbg"> <parameters managerid="11" typeid="1193047" typeidversion="2" instanceid="09b11c53-8b5c-4eeb-8f00-d84eaa0aaa4f"/> </virtualport> </interface>
Or (for vnlink): <interface type='network'> <source network='red-network'/> <virtualport type="802.1Qbh"> <parameters profile_name="profile1"/> </virtualport> </interface>
This illustrates the problem I was wondering about - in your example it would not be possible for the guest to migrate from the host using a vepa switch to the host using a vnlink switch (and it would be possible
You are right. When trying to migrate between vepa and vnlink there will be missing attributes in each in case we leave it on the host.
to migrate to a host using a standard bridge only if the virtualport element was ignored). If the virtualport element lived with the network definition of red-network on each host, it could be migrated without problem.
The only problematic thing would be if any of the attributes within <parameters> was unique for each guest (I don't know anything about the individual attributes, but "instanceid" sounds like it might be different for each guest).
Then, when migrating from a vepa/vnlink host to another vepa/vnlink host containing red-network, the profile attributes will be available at the guest domain xml. In case the target host has a red-network, which isn't vepa/vnlink, we want to be able to choose whether to make the use of the profile attributes optional (i.e., libvirt won't fail in case of migrating to a network of another type), or mandatory (i.e., libvirt will fail in case of migration to a non-vepa/vnlink network).
We have something similar in CPU flags: <cpu match="exact"> <model>qemu64</model> <topology sockets="S" cores="C" threads="T"/> <feature policy="require/optional/disable......" name="sse2"/> </cpu>
In this analogy, does "CPU flags" == "mode (vepa/vnlink/bridge)" or does "CPU flags" == "virtualport parameters" ? It seems like what you're wanting can be satisfied by simply not defining "red-network" on the hosts that don't have the proper networking setup available (maybe what you *really* want to call it is "red-vnlink-network").
What I meant to say in that is that we would like to have the ability to say if an attribute must me used, or not. The issues you mention are indeed interesting. I'm cc-ing libvirt-list to see what other people think. Putting it on the guest will indeed make it problematic to migrate between networks that need different parameters (vnlink/vepa for example). Oved

On 04/28/2011 04:15 AM, Oved Ourfalli wrote:
Laine, hello.
We read your proposal for abstraction of guest<--> host network connection in libvirt.
You has an open issue there regarding the vepa/vnlink attributes: "3) What about the parameters in the<virtualport> element that are currently used by vepa/vnlink. Do those belong with the host, or with the guest?"
The parameters for the virtualport element should be on the guest, and not the host, because a specific interface can run multiple profiles, Are you talking about host interface or guest interface? If you mean
From: "Laine Stump"<lstump@redhat.com On 04/27/2011 09:58 AM, Oved Ourfalli wrote: that multiple different profiles can be used when connecting to a particular switch - as long as there are only a few different profiles, rather than each guest having its own unique profile, then it still seems better to have the port profile live with the network definition (and just define multiple networks, one for each port profile).
The profile names can be changed regularly, so it looks like it will be better to put them in the guest level, so that the network host file won't have to be changed on all hosts once something has changed in the profiles.
Also, you will have a duplication of data, writing all the profile name on all the hosts that are connected to the vn-link/vepa switch.
But is it potentially the same for many/all guests, or is it necessarily different for every guest? If it's the former, then do you have more guests, or more hosts?
so it will be a mistake to define a profile to be interface specific on the host. Moreover, putting it in the guest level will enable us in the future (if supported by libvirt/qemu) to migrate a vm from a host with vepa/vnlink interfaces, to another host with a bridge, for example. It seems to me like doing exactly the opposite would make it easier to migrate to a host that used a different kind of switching (from vepa to vnlink, or from a bridged interface to vepa, etc), since the port profile required for a particular host's network would be at the host waiting to be used. You are right, but we would want to have the option to prevent that from happening in case we wouldn't want to allow it. We can make the ability to migrate between different network types configurable, and we would like an easy way to tell libvirt - "please allow/don't allow it".
I *think* what you're getting at is this situation: HostA has a group of interfaces that are connected to a vepa-capable switch, HostB has a group of interfaces connected to a vnlink-capable switch. Guest1 is allowed to connect either via a vnlink switch or a vepa switch, but Guest2 should only use vepa. In that case, HostA would have a network that had a pool of interfaces and type "vepa", while HostB would have a pool of interfaces and a type "vnlink". Guest1 could be accommodated by giving both networks the same name, or Guest2 could be accommodated by giving each network a different name (when migrating, if the dest. host doesn't have the desired network, the migration would fail). However, using just the network naming, it wouldn't be possible to allow both. I don't think keeping the virtualport parameters only with the guest would help (or hurt) this though. What would be needed would be to have the information about network type *optionally* specified in the guest interface config (as well as in the network config); if present the migration would only succeed if the given network on the dest host matched the given type (and parameters, if any) in the guest config.
So, in the networks at the host level you will have: <network type='direct'> <name>red-network</name> <source mode='vepa'> <pool> <interface> <name>eth0</name> ..... </interface> <interface> <name>eth4</name> ..... </interface> <interface> <name>eth18</name> ..... </interface> </pool> </source> </network>
And in the guest you will have (for vepa): <interface type='network'> <source network='red-network'/> <virtualport type="802.1Qbg"> <parameters managerid="11" typeid="1193047" typeidversion="2" instanceid="09b11c53-8b5c-4eeb-8f00-d84eaa0aaa4f"/> </virtualport> </interface>
Or (for vnlink): <interface type='network'> <source network='red-network'/> <virtualport type="802.1Qbh"> <parameters profile_name="profile1"/> </virtualport> </interface>
What would the interface for a 2nd guest of each type look like? Could it be identical? Or might some parameters change for every single guest? Perhaps it would be best to have virtualport parameters on both network and guest interface XML, and merge the two to arrive at what's used (the network definition could contain all the attributes that would be common to all guests using that network on that host, and the guest interface definition would contain extra parameters specific to that host. In the case of a parameter being specified in both places, if they were not identical, the migration would fail).
This illustrates the problem I was wondering about - in your example it would not be possible for the guest to migrate from the host using a vepa switch to the host using a vnlink switch (and it would be possible You are right. When trying to migrate between vepa and vnlink there will be missing attributes in each in case we leave it on the host.
(you mean if we leave the config on the *guest*, I guess...)
to migrate to a host using a standard bridge only if the virtualport element was ignored). If the virtualport element lived with the network definition of red-network on each host, it could be migrated without problem.
The only problematic thing would be if any of the attributes within <parameters> was unique for each guest (I don't know anything about the individual attributes, but "instanceid" sounds like it might be different for each guest).
Then, when migrating from a vepa/vnlink host to another vepa/vnlink host containing red-network, the profile attributes will be available at the guest domain xml. In case the target host has a red-network, which isn't vepa/vnlink, we want to be able to choose whether to make the use of the profile attributes optional (i.e., libvirt won't fail in case of migrating to a network of another type), or mandatory (i.e., libvirt will fail in case of migration to a non-vepa/vnlink network).
We have something similar in CPU flags: <cpu match="exact"> <model>qemu64</model> <topology sockets="S" cores="C" threads="T"/> <feature policy="require/optional/disable......" name="sse2"/> </cpu> In this analogy, does "CPU flags" == "mode (vepa/vnlink/bridge)" or does "CPU flags" == "virtualport parameters" ? It seems like what you're wanting can be satisfied by simply not defining "red-network" on the hosts that don't have the proper networking setup available (maybe what you *really* want to call it is "red-vnlink-network"). What I meant to say in that is that we would like to have the ability to say if an attribute must me used, or not.
Sure, it sounds useful. Would what I outlined above be sufficient? (It would allow you to say "this guest must have a vepa network connection" or "this guest can have any network connection, as long as it's named "red-network". It *won't* allow saying "this guest must have vepa or vnlink, bridge is not allowed, even if the network name is the same". You could also put most of the config with the host network definition, but allow, eg instanceid to be specified in the guest config.
The issues you mention are indeed interesting. I'm cc-ing libvirt-list to see what other people think. Putting it on the guest will indeed make it problematic to migrate between networks that need different parameters (vnlink/vepa for example).

See my comments below. Thank you, Oved ----- Original Message -----
From: "Laine Stump" <laine@laine.org> To: libvir-list@redhat.com Sent: Friday, April 29, 2011 3:45:50 PM Subject: Re: [libvirt] migration of vnlink VMs On 04/28/2011 04:15 AM, Oved Ourfalli wrote:
Laine, hello.
We read your proposal for abstraction of guest<--> host network connection in libvirt.
You has an open issue there regarding the vepa/vnlink attributes: "3) What about the parameters in the<virtualport> element that are currently used by vepa/vnlink. Do those belong with the host, or with the guest?"
The parameters for the virtualport element should be on the guest, and not the host, because a specific interface can run multiple profiles, Are you talking about host interface or guest interface? If you mean
From: "Laine Stump"<lstump@redhat.com On 04/27/2011 09:58 AM, Oved Ourfalli wrote: that multiple different profiles can be used when connecting to a particular switch - as long as there are only a few different profiles, rather than each guest having its own unique profile, then it still seems better to have the port profile live with the network definition (and just define multiple networks, one for each port profile).
The profile names can be changed regularly, so it looks like it will be better to put them in the guest level, so that the network host file won't have to be changed on all hosts once something has changed in the profiles.
Also, you will have a duplication of data, writing all the profile name on all the hosts that are connected to the vn-link/vepa switch.
But is it potentially the same for many/all guests, or is it necessarily different for every guest? If it's the former, then do you have more guests, or more hosts?
I guess it will be the same for many guests. There will be some profiles, and each group of guests will use the same profile, according to its demands.
so it will be a mistake to define a profile to be interface specific on the host. Moreover, putting it in the guest level will enable us in the future (if supported by libvirt/qemu) to migrate a vm from a host with vepa/vnlink interfaces, to another host with a bridge, for example. It seems to me like doing exactly the opposite would make it easier to migrate to a host that used a different kind of switching (from vepa to vnlink, or from a bridged interface to vepa, etc), since the port profile required for a particular host's network would be at the host waiting to be used. You are right, but we would want to have the option to prevent that from happening in case we wouldn't want to allow it. We can make the ability to migrate between different network types configurable, and we would like an easy way to tell libvirt - "please allow/don't allow it".
I *think* what you're getting at is this situation:
HostA has a group of interfaces that are connected to a vepa-capable switch, HostB has a group of interfaces connected to a vnlink-capable switch. Guest1 is allowed to connect either via a vnlink switch or a vepa switch, but Guest2 should only use vepa.
In that case, HostA would have a network that had a pool of interfaces and type "vepa", while HostB would have a pool of interfaces and a type "vnlink". Guest1 could be accommodated by giving both networks the same name, or Guest2 could be accommodated by giving each network a different name (when migrating, if the dest. host doesn't have the desired network, the migration would fail). However, using just the network naming, it wouldn't be possible to allow both.
I don't think keeping the virtualport parameters only with the guest would help (or hurt) this though. What would be needed would be to have the information about network type *optionally* specified in the guest interface config (as well as in the network config); if present the migration would only succeed if the given network on the dest host matched the given type (and parameters, if any) in the guest config.
That would be great. It will enable the flexibility we need.
So, in the networks at the host level you will have: <network type='direct'> <name>red-network</name> <source mode='vepa'> <pool> <interface> <name>eth0</name> ..... </interface> <interface> <name>eth4</name> ..... </interface> <interface> <name>eth18</name> ..... </interface> </pool> </source> </network>
And in the guest you will have (for vepa): <interface type='network'> <source network='red-network'/> <virtualport type="802.1Qbg"> <parameters managerid="11" typeid="1193047" typeidversion="2" instanceid="09b11c53-8b5c-4eeb-8f00-d84eaa0aaa4f"/> </virtualport> </interface>
Or (for vnlink): <interface type='network'> <source network='red-network'/> <virtualport type="802.1Qbh"> <parameters profile_name="profile1"/> </virtualport> </interface>
What would the interface for a 2nd guest of each type look like? Could it be identical? Or might some parameters change for every single guest?
For vn-link it will be the same, just the profile_name. As for vepa, the instanceid is vm specific so it should be on the guest (taken from http://libvirt.org/formatdomain.html): "managerid - The VSI Manager ID identifies the database containing the VSI type and instance definitions. This is an integer value and the value 0 is reserved. typeid - The VSI Type ID identifies a VSI type characterizing the network access. VSI types are typically managed by network administrator. This is an integer value. typeidversion - The VSI Type Version allows multiple versions of a VSI Type. This is an integer value. instanceid - The VSI Instance ID Identifier is generated when a VSI instance (i.e. a virtual interface of a virtual machine) is created. This is a globally unique identifier." That's what we know on vepa an vn-link now. I guess that when we'll have the possibility to test these environments we will learn more on them.
Perhaps it would be best to have virtualport parameters on both network and guest interface XML, and merge the two to arrive at what's used (the network definition could contain all the attributes that would be common to all guests using that network on that host, and the guest interface definition would contain extra parameters specific to that host. In the case of a parameter being specified in both places, if they were not identical, the migration would fail).
Sounds good.
This illustrates the problem I was wondering about - in your example it would not be possible for the guest to migrate from the host using a vepa switch to the host using a vnlink switch (and it would be possible You are right. When trying to migrate between vepa and vnlink there will be missing attributes in each in case we leave it on the host.
(you mean if we leave the config on the *guest*, I guess...)
to migrate to a host using a standard bridge only if the virtualport element was ignored). If the virtualport element lived with the network definition of red-network on each host, it could be migrated without problem.
Then, when migrating from a vepa/vnlink host to another vepa/vnlink host containing red-network, the profile attributes will be available at the guest domain xml. In case the target host has a red-network, which isn't vepa/vnlink, we want to be able to choose whether to make the use of the profile attributes optional (i.e., libvirt won't fail in case of migrating to a network of another type), or mandatory (i.e., libvirt will fail in case of migration to a non-vepa/vnlink network).
We have something similar in CPU flags: <cpu match="exact"> <model>qemu64</model> <topology sockets="S" cores="C" threads="T"/> <feature policy="require/optional/disable......" name="sse2"/> </cpu> In this analogy, does "CPU flags" == "mode (vepa/vnlink/bridge)" or does "CPU flags" == "virtualport parameters" ? It seems like what you're wanting can be satisfied by simply not defining "red-network" on
The only problematic thing would be if any of the attributes within <parameters> was unique for each guest (I don't know anything about the individual attributes, but "instanceid" sounds like it might be different for each guest). the hosts that don't have the proper networking setup available (maybe what you *really* want to call it is "red-vnlink-network"). What I meant to say in that is that we would like to have the ability to say if an attribute must me used, or not.
Sure, it sounds useful. Would what I outlined above be sufficient? (It would allow you to say "this guest must have a vepa network connection" or "this guest can have any network connection, as long as it's named "red-network". It *won't* allow saying "this guest must have vepa or vnlink, bridge is not allowed, even if the network name is the same". You could also put most of the config with the host network definition, but allow, eg instanceid to be specified in the guest config.
I think this would indeed be enough.
The issues you mention are indeed interesting. I'm cc-ing libvirt-list to see what other people think. Putting it on the guest will indeed make it problematic to migrate between networks that need different parameters (vnlink/vepa for example).
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

To "cut to the chase", go down to the end of the message, where I outline the proposed XML changes to support this. If everyone approves of it, I'll make the RNG based on that description and update the parser, then worry about filling in the functionality. On 04/29/2011 09:29 AM, Oved Ourfalli wrote:
See my comments below.
Thank you, Oved
This is very useful, Oved! Thanks for the replies.
----- Original Message -----
From: "Laine Stump"<laine@laine.org> To: libvir-list@redhat.com Sent: Friday, April 29, 2011 3:45:50 PM Subject: Re: [libvirt] migration of vnlink VMs On 04/28/2011 04:15 AM, Oved Ourfalli wrote:
Laine, hello.
We read your proposal for abstraction of guest<--> host network connection in libvirt.
You has an open issue there regarding the vepa/vnlink attributes: "3) What about the parameters in the<virtualport> element that are currently used by vepa/vnlink. Do those belong with the host, or with the guest?"
The parameters for the virtualport element should be on the guest, and not the host, because a specific interface can run multiple profiles, Are you talking about host interface or guest interface? If you mean
From: "Laine Stump"<lstump@redhat.com On 04/27/2011 09:58 AM, Oved Ourfalli wrote: that multiple different profiles can be used when connecting to a particular switch - as long as there are only a few different profiles, rather than each guest having its own unique profile, then it still seems better to have the port profile live with the network definition (and just define multiple networks, one for each port profile).
The profile names can be changed regularly, so it looks like it will be better to put them in the guest level, so that the network host file won't have to be changed on all hosts once something has changed in the profiles.
Also, you will have a duplication of data, writing all the profile name on all the hosts that are connected to the vn-link/vepa switch. But is it potentially the same for many/all guests, or is it necessarily different for every guest? If it's the former, then do you have more guests, or more hosts?
I guess it will be the same for many guests. There will be some profiles, and each group of guests will use the same profile, according to its demands.
Except instanceid, as you point out below. Since there is at least one exception to the "same for all guests", we do need to keep <virtualport> with the guest, but since many are changed, we should also supply it with the network. Things are starting to take shape now :-)
so it will be a mistake to define a profile to be interface specific on the host. Moreover, putting it in the guest level will enable us in the future (if supported by libvirt/qemu) to migrate a vm from a host with vepa/vnlink interfaces, to another host with a bridge, for example.
It seems to me like doing exactly the opposite would make it easier to migrate to a host that used a different kind of switching (from vepa to vnlink, or from a bridged interface to vepa, etc), since the port profile required for a particular host's network would be at the host waiting to be used. You are right, but we would want to have the option to prevent that from happening in case we wouldn't want to allow it. We can make the ability to migrate between different network types configurable, and we would like an easy way to tell libvirt - "please allow/don't allow it". I *think* what you're getting at is this situation:
HostA has a group of interfaces that are connected to a vepa-capable switch, HostB has a group of interfaces connected to a vnlink-capable switch. Guest1 is allowed to connect either via a vnlink switch or a vepa switch, but Guest2 should only use vepa.
In that case, HostA would have a network that had a pool of interfaces and type "vepa", while HostB would have a pool of interfaces and a type "vnlink". Guest1 could be accommodated by giving both networks the same name, or Guest2 could be accommodated by giving each network a different name (when migrating, if the dest. host doesn't have the desired network, the migration would fail). However, using just the network naming, it wouldn't be possible to allow both.
I don't think keeping the virtualport parameters only with the guest would help (or hurt) this though. What would be needed would be to have the information about network type *optionally* specified in the guest interface config (as well as in the network config); if present the migration would only succeed if the given network on the dest host matched the given type (and parameters, if any) in the guest config.
That would be great. It will enable the flexibility we need
So, in the networks at the host level you will have: <network type='direct'> <name>red-network</name> <source mode='vepa'> <pool> <interface> <name>eth0</name> ..... </interface> <interface> <name>eth4</name> ..... </interface> <interface> <name>eth18</name> ..... </interface> </pool>
I'm assuming there will never be a case when an interface pool might need to be used by *both* a vepa network and a vnlink network (doesn't really make sense - it will be physically connected to a switch that supports vnlink, or one that supports vepa; by definition it can't be connected to both (I don't foresee any switch that supports both!). So defining the pool inside network should work okay. Since each interface will always have a name (and just one name), I think it makes sense to make that an attribute rather than subelement.
</source> </network>
And in the guest you will have (for vepa): <interface type='network'> <source network='red-network'/>
and if you want to force connecting only to vepa or vnlink, you can do: <source network='red-network' mode='vepa'/> for example.
<virtualport type="802.1Qbg"> <parameters managerid="11" typeid="1193047" typeidversion="2" instanceid="09b11c53-8b5c-4eeb-8f00-d84eaa0aaa4f"/> </virtualport>
Likewise, anything that's specified here MUST either match what is in the <network> definition on the host, or be unspecified on the host. The final virtualport definition used will be the OR of the two.
</interface>
Or (for vnlink): <interface type='network'> <source network='red-network'/> <virtualport type="802.1Qbh"> <parameters profile_name="profile1"/> </virtualport> </interface> What would the interface for a 2nd guest of each type look like? Could it be identical? Or might some parameters change for every single guest? For vn-link it will be the same, just the profile_name. As for vepa, the instanceid is vm specific so it should be on the guest (taken from http://libvirt.org/formatdomain.html):
"managerid - The VSI Manager ID identifies the database containing the VSI type and instance definitions. This is an integer value and the value 0 is reserved. typeid - The VSI Type ID identifies a VSI type characterizing the network access. VSI types are typically managed by network administrator. This is an integer value. typeidversion - The VSI Type Version allows multiple versions of a VSI Type. This is an integer value. instanceid - The VSI Instance ID Identifier is generated when a VSI instance (i.e. a virtual interface of a virtual machine) is created. This is a globally unique identifier."
That's what we know on vepa an vn-link now. I guess that when we'll have the possibility to test these environments we will learn more on them.
Perhaps it would be best to have virtualport parameters on both network and guest interface XML, and merge the two to arrive at what's used (the network definition could contain all the attributes that would be common to all guests using that network on that host, and the guest interface definition would contain extra parameters specific to that host. In the case of a parameter being specified in both places, if they were not identical, the migration would fail).
Sounds good.
Okay, here's a brief description of what I *think* will work. I'll build up the RNG based on this pseudo-xml: For the <interface> definition in the guest XML, the main change will be that <source .. mode='something'> will be valid (but optional) when interface type='network' - in this case, it will just be used to match against the source mode of the network on the host. <virtualport> will also become valid for type='network', and will serve two purposes: 1) if there is a mismatch with the virtualport on the host network, the migrate/start will fail. 2) It will be ORed with <virtualport> on the host network to arrive at the virtualport settings actually used. For example: <interface type='network'> <source network='red-network' mode='vepa'/> <virtualport type='802.1Qbg'> <parameters instanceid='09b11c53-8b5c-4eeb-8f00-d84eaa0aaa4f'/> </virtualport> <mac address='xx:xx:.....'/> </interface> (NB: if "mode" isn't specified, and the host network is actually a bridge or virtual network, the contents of virtualport will be ignored.) <network> will be expanded by giving it an optional "type" attribute (which will default to 'virtual'), <source> subelement, and <virtualport> subelement. When type='bridge', you can specify source exactly as you would in a domain <interface> definition: <network type='bridge'> <name>red-network</name> <source bridge='br0'/> </network> When type='direct', again you can specify source and virtualport pretty much as you would in an interface definition: <network type='direct'> <name>red-network</name> <source dev='eth0' mode='vepa'/> <virtualport type='802.1Qbg'> <parameters managerid="11" typeid="1193047" typeidversion="2" instanceid='09b11c53-8b5c-4eeb-8f00-d84eaa0aaa4f'/> </virtualport> </network> However, dev would be optional - if not specified, we would expect a pool of interfaces to be defined within source, eg: <network type='direct'> <name>red-network</name> <source mode='vepa'> <pool> <interface name='eth10' maxConnect='1'/> <interface name='eth11' maxConnect='1'/> <interface name='eth12' maxConnect='1'/> <interface name='eth13' maxConnect='1'/> <interface name='eth14' maxConnect='1'/> <interface name='eth25' maxConnect='5'/> </pool> </source> <virtualport ...... /> </network> at connect time, source dev would be allocated from the pool of interfaces, round robin, with each interface having at most maxConnect connections to guests at any given time. Again, <virtualport> is optional, and if specified would have the same purpose as <virtualport> in the interface definition. Does this look like it covers everything? BTW, for all the people asking about sectunnel, openvswitch, and vde - can you see how those would fit in with this? In particular, do you see any conflicts? (It's easy to add more stuff on later if something is just missing, but much more problematic if I put something in that is just plain wrong).

On Fri, Apr 29, 2011 at 04:12:55PM -0400, Laine Stump wrote:
Okay, here's a brief description of what I *think* will work. I'll build up the RNG based on this pseudo-xml:
For the <interface> definition in the guest XML, the main change will be that <source .. mode='something'> will be valid (but optional) when interface type='network' - in this case, it will just be used to match against the source mode of the network on the host. <virtualport> will also become valid for type='network', and will serve two purposes:
1) if there is a mismatch with the virtualport on the host network, the migrate/start will fail. 2) It will be ORed with <virtualport> on the host network to arrive at the virtualport settings actually used.
For example:
<interface type='network'> <source network='red-network' mode='vepa'/>
IMHO having a 'mode' here is throwing away the main reason for using type=network in the first place - namely independance from this host config element.
<virtualport type='802.1Qbg'> <parameters instanceid='09b11c53-8b5c-4eeb-8f00-d84eaa0aaa4f'/> </virtualport> <mac address='xx:xx:.....'/> </interface>
(NB: if "mode" isn't specified, and the host network is actually a bridge or virtual network, the contents of virtualport will be ignored.)
<network> will be expanded by giving it an optional "type" attribute (which will default to 'virtual'), <source> subelement, and <virtualport> subelement. When type='bridge', you can specify source exactly as you would in a domain <interface> definition:
<network type='bridge'> <name>red-network</name> <source bridge='br0'/> </network>
When type='direct', again you can specify source and virtualport pretty much as you would in an interface definition:
<network type='direct'> <name>red-network</name> <source dev='eth0' mode='vepa'/> <virtualport type='802.1Qbg'> <parameters managerid="11" typeid="1193047" typeidversion="2" instanceid='09b11c53-8b5c-4eeb-8f00-d84eaa0aaa4f'/> </virtualport> </network>
None of this really feels right to me. With this proposed schema, there is basically nothing in common between the existing functionality for <network> and this new functionality except for the <name> and <uuid> elements. Apps which know how to deal with existing <network> schema will have no ability to interpret this new data at all. Quite probably they will mis-interpet it as providing an isolated virtual network, with no IP addr set, since this design isn't actually changing any attribute value that they currently look for. Either we need to make this align with the existing schema, or we need to put this under a completely separate set of APIs. I think we can likely do better with the schema design and achieve the former.
However, dev would be optional - if not specified, we would expect a pool of interfaces to be defined within source, eg:
<network type='direct'> <name>red-network</name> <source mode='vepa'> <pool> <interface name='eth10' maxConnect='1'/> <interface name='eth11' maxConnect='1'/> <interface name='eth12' maxConnect='1'/> <interface name='eth13' maxConnect='1'/> <interface name='eth14' maxConnect='1'/> <interface name='eth25' maxConnect='5'/> </pool> </source> <virtualport ...... /> </network>
I don't really like the fact that this design has special cased the num(intefaces) == 1 case to have a completely different XML schema. eg we have this: <source dev='eth0' mode='vepa'/> And this <source mode='vepa'> <pool> <interface name='eth10' maxConnect='1'/> </pool> both meaning the same thing. There should only be one representation in the schema for this kind of thing.
BTW, for all the people asking about sectunnel, openvswitch, and vde - can you see how those would fit in with this? In particular, do you see any conflicts? (It's easy to add more stuff on later if something is just missing, but much more problematic if I put something in that is just plain wrong).
As mentioned above, I think this design is wrong, because it is not taking any account of the current schema for <network> which defines the various routed modes. Currently <network> supports 3 connectivity modes - Non-routed network, separate subnet (no <forward> element present) - Routed network, separate subnet with NAT (<forward mode='nat'/>) - Routed network, separate subnet (<forward mode='route'/>) Following on from this, I can see another couple of routed modes - Routed network, IP subnetting - Routed network, separate subnet with VPN And the core goal here is to replae type=bridge and type=direct from the domain XML, which means we're adding several bridging modes - Bridged network, eth + bridge + tap (akin to type=bridge) - Bridged network, eth + macvtap (akin to type=direct) - Bridged network, sriov eth + bridge + tap (akin to type=bridge) - Bridged network, sriov eth + macvtap (akin to type=direct) The macvtap can be in 4 modes, so perhaps it is probably better to consider them separately - Bridged network, eth + bridge + tap - Bridged network, eth + macvtap + vepa - Bridged network, eth + macvtap + private - Bridged network, eth + macvtap + passthrough - Bridged network, eth + macvtap + bridge - Bridged network, sriov eth + bridge + tap - Bridged network, sriov eth + macvtap + vepa - Bridged network, sriov eth + macvtap + private - Bridged network, sriov eth + macvtap + passthrough - Bridged network, sriov eth + macvtap + bridge I can also perhaps imagine another VPN mode: - Bridged network, with VPN The current routed modes can route to anywhere, or be restricted to a particular network interface eg with <forward dev='eth0'/>. It only allows for a single interface, though even for routed modes it could be desirable to list multiple devs. The other big distinction is that the <network> modes which do routing, include interface configuration data (ie the IP addrs & bridge name) which is configured on the fly. It looks like with the bridged modes, you're assuming the app has statically configured the interfaces via the virInterface APIs already, and this just points to an existing configured interface. This isn't neccessarily a bad thing, just an observation of a significant difference. So if we ignore the <ip> and <domain> elements from the current <network> schema, then there are a handful of others which we need to have a plan for <forward mode='nat|route'/> (omitted completely for isolated networks) <bridge name="virbr0" /> (auto-generated/filled if omitted) <mac address='....'/> (auto-generated/filled if omitted) The <forward> element can have an optional dev= attribute. I think the key attribute is the <forward> mode= attribute. I think we should be adding further values to that attribute for the new network modes we want to support. We should also make use of the dev= attribute on <forward> where practical, and/or extend it. We could expand the list of <foward> mode values in a flat list - route - nat - bridge (brctl) - vepa - private - passthru - bridge (macvtap) NB: really need to avoid using 'bridge' in terminology, since all 5 of the last options are really 'bridge'. Or we could introduce a extra attribute, and have a 2 level list - <forward layer='link'/> (for all ethernet layer bridging) - <forward layer='network'/> (for all IP layer bridging aka routing) So the current modes would be <forward layer='network' mode='route|nat'/> And new bridging modes would be <forward layer='link' mode='bridge-brctl|vepa|private|passthru|bridge-macvtap'/> For the brctl/macvtap modes, the dev= attribute on <forward> could point to the NIC being used, while with brctl modes, <bridge> would also be present. In the SRIOV case, we potentiallly need a list of interfaces. For this we probably want to use <forward dev='eth0'> <interface dev='eth0'/> <interface dev='eth1'/> <interface dev='eth2'/> ... </forward> NB, the first interface is always to be listed both as a dev= attribute (for compat with existing apps) *and* as a child <interface> element (for apps knowing the new schema). The maxConnect= attribute from your examples above is an interesting thing. I'm not sure whether that is neccessarily a good idea. It feels similar to VMWare's "port group" idea, but I don't think having a simple 'maxConnect=' attribute is sufficient to let us represent the vmware port group idea. I think we might need an more explicit element eg <portgroup count='5'> <interface dev='eth2'/> </portgroup> eg, so this associates a port group which allows 5 clients (VM NICs) with the uplink provided by eth2 (which is assumed to be listed under <forward>). So a complete SRIOV example might be <network> <name>Foo</name> <forward dev='eth0' layer='link' mode='vepa'> <interface dev='eth0'/> <interface dev='eth1'/> <interface dev='eth2'/> ... </forward> <portgroup count='10'> <interface dev='eth0'/> </portgroup> <portgroup count='5'> <interface dev='eth1'/> </portgroup> <portgroup count='5'> <interface dev='eth2'/> </portgroup> </network> The <virtualport> parameters for VEPA/VNLink could either be stored at the top level under <network>, or inside <portgroup> or both. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Tue, May 24, 2011 at 10:29:00PM +0200, Jérémie Tarot wrote:
Hi,
2011/5/24 Daniel P. Berrange <berrange@redhat.com>
...
And new bridging modes would be
<forward layer='link' mode='bridge-brctl|vepa|private|passthru|bridge-macvtap'/>
VDE ? Openvswitch ?
I would be very happy to push openvswitch and suggest to people to rely on it ... once it is accepted in the upstream kernel. Unfortunately that's still not the case, and history of hypervisors like openvz and xen really shows why this should be their #1 focus, instead of being put in the backburner TODO list as it seems [*] Daniel [*] I would love to be proved wrong on this ! -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/

I'll send a separate email (in a new thread, so it doesn't get lost! ;-) with a new draft of what the network XML should look like, but wanted to respond to Dan's comments inline... On 05/24/2011 10:21 AM, Daniel P. Berrange wrote:
On Fri, Apr 29, 2011 at 04:12:55PM -0400, Laine Stump wrote:
Okay, here's a brief description of what I *think* will work. I'll build up the RNG based on this pseudo-xml:
For the<interface> definition in the guest XML, the main change will be that<source .. mode='something'> will be valid (but optional) when interface type='network' - in this case, it will just be used to match against the source mode of the network on the host. <virtualport> will also become valid for type='network', and will serve two purposes:
1) if there is a mismatch with the virtualport on the host network, the migrate/start will fail. 2) It will be ORed with<virtualport> on the host network to arrive at the virtualport settings actually used.
For example:
<interface type='network'> <source network='red-network' mode='vepa'/> IMHO having a 'mode' here is throwing away the main reason for using type=network in the first place - namely independance from this host config element.
I agree, but was being accommodating :-) Since then, Dave has pointed out that the same functionality can be achieved by having the management application grab the XML for the network on the targetted host, and check for matches of any important parameters before deciding to migrate to that host. This has 2 advantages: 1) It is more flexible. The management application can check for more than just mode='vepa', but also any number of other attributes of the network on the target. 2) The result of a host's network not matching the desired mode will be "management app looks elsewhere", rather than "migration fails". The management application will need to do this anyway (even if just to check that the given network is present at all) or, again, face the prospect of the migration failing. So I'll withdraw this piece from the next draft.
<virtualport type='802.1Qbg'> <parameters instanceid='09b11c53-8b5c-4eeb-8f00-d84eaa0aaa4f'/> </virtualport> <mac address='xx:xx:.....'/> </interface>
(NB: if "mode" isn't specified, and the host network is actually a bridge or virtual network, the contents of virtualport will be ignored.)
<network> will be expanded by giving it an optional "type" attribute (which will default to 'virtual'),<source> subelement, and <virtualport> subelement. When type='bridge', you can specify source exactly as you would in a domain<interface> definition:
<network type='bridge'> <name>red-network</name> <source bridge='br0'/> </network>
When type='direct', again you can specify source and virtualport pretty much as you would in an interface definition:
<network type='direct'> <name>red-network</name> <source dev='eth0' mode='vepa'/> <virtualport type='802.1Qbg'> <parameters managerid="11" typeid="1193047" typeidversion="2" instanceid='09b11c53-8b5c-4eeb-8f00-d84eaa0aaa4f'/> </virtualport> </network> None of this really feels right to me. With this proposed schema, there is basically nothing in common between the existing functionality for<network> and this new functionality except for the<name> and<uuid> elements.
Apps which know how to deal with existing<network> schema will have no ability to interpret this new data at all. Quite probably they will mis-interpet it as providing an isolated virtual network, with no IP addr set, since this design isn't actually changing any attribute value that they currently look for.
Either we need to make this align with the existing schema, or we need to put this under a completely separate set of APIs. I think we can likely do better with the schema design and achieve the former.
So the problem is that the new uses are so orthogonal to the current usage that existing management apps encountering this new XML will mistakenly believe that it's "old" XML with a bit of extra stuff that can be ignored (thus leading to mayhem). I think the most important thing is to make sure that a config for one of these new types will have at least one change to an *existing* element/attribute (mine just added a *new* attribute specifying type) that causes existing apps to realize this isn't just an old school network definition that happens to have a few kinks on the side. Your suggestion of using new values for <forward mode="..."> seems like as good an idea as any (actually I can't think of anything else that works as well :-)
However, dev would be optional - if not specified, we would expect a pool of interfaces to be defined within source, eg:
<network type='direct'> <name>red-network</name> <source mode='vepa'> <pool> <interface name='eth10' maxConnect='1'/> <interface name='eth11' maxConnect='1'/> <interface name='eth12' maxConnect='1'/> <interface name='eth13' maxConnect='1'/> <interface name='eth14' maxConnect='1'/> <interface name='eth25' maxConnect='5'/> </pool> </source> <virtualport ...... /> </network> I don't really like the fact that this design has special cased the num(intefaces) == 1 case to have a completely different XML schema. eg we have this:
<source dev='eth0' mode='vepa'/>
And this
<source mode='vepa'> <pool> <interface name='eth10' maxConnect='1'/> </pool>
BTW, for all the people asking about sectunnel, openvswitch, and vde - can you see how those would fit in with this? In particular, do you see any conflicts? (It's easy to add more stuff on later if something is just missing, but much more problematic if I put something in that is just plain wrong). As mentioned above, I think this design is wrong, because it is not taking any account of the current schema for<network> which defines
both meaning the same thing. There should only be one representation in the schema for this kind of thing. the various routed modes.
Currently<network> supports 3 connectivity modes
- Non-routed network, separate subnet (no<forward> element present) - Routed network, separate subnet with NAT (<forward mode='nat'/>) - Routed network, separate subnet (<forward mode='route'/>)
Following on from this, I can see another couple of routed modes
- Routed network, IP subnetting - Routed network, separate subnet with VPN
And the core goal here is to replae type=bridge and type=direct from the domain XML, which means we're adding several bridging modes
- Bridged network, eth + bridge + tap (akin to type=bridge) - Bridged network, eth + macvtap (akin to type=direct) - Bridged network, sriov eth + bridge + tap (akin to type=bridge) - Bridged network, sriov eth + macvtap (akin to type=direct)
The macvtap can be in 4 modes, so perhaps it is probably better to consider them separately
- Bridged network, eth + bridge + tap - Bridged network, eth + macvtap + vepa - Bridged network, eth + macvtap + private - Bridged network, eth + macvtap + passthrough - Bridged network, eth + macvtap + bridge - Bridged network, sriov eth + bridge + tap - Bridged network, sriov eth + macvtap + vepa - Bridged network, sriov eth + macvtap + private - Bridged network, sriov eth + macvtap + passthrough - Bridged network, sriov eth + macvtap + bridge
I can also perhaps imagine another VPN mode:
- Bridged network, with VPN
The current routed modes can route to anywhere, or be restricted to a particular network interface eg with<forward dev='eth0'/>. It only allows for a single interface, though even for routed modes it could be desirable to list multiple devs.
The other big distinction is that the<network> modes which do routing, include interface configuration data (ie the IP addrs& bridge name) which is configured on the fly. It looks like with the bridged modes, you're assuming the app has statically configured the interfaces via the virInterface APIs already, and this just points to an existing configured interface. This isn't neccessarily a bad thing, just an observation of a significant difference. Right. Perhaps later it can be expanded (at least in some of the modes) to setup these devices when the network is started, but right now the network definition is just used to point to something that already exists and is functioning.
So if we ignore the<ip> and<domain> elements from the current<network> schema, then there are a handful of others which we need to have a plan for
<forward mode='nat|route'/> (omitted completely for isolated networks) <bridge name="virbr0" /> (auto-generated/filled if omitted) <mac address='....'/> (auto-generated/filled if omitted)
The<forward> element can have an optional dev= attribute.
I think the key attribute is the<forward> mode= attribute. I think we should be adding further values to that attribute for the new network modes we want to support. We should also make use of the dev= attribute on<forward> where practical, and/or extend it.
We could expand the list of<foward> mode values in a flat list
- route - nat - bridge (brctl) - vepa - private - passthru - bridge (macvtap)
NB: really need to avoid using 'bridge' in terminology, since all 5 of the last options are really 'bridge'.
Or we could introduce a extra attribute, and have a 2 level list
-<forward layer='link'/> (for all ethernet layer bridging)
Does that gain us anything, though? While it's correct information, it seems redundant (the layer can always be implied from the mode).
-<forward layer='network'/> (for all IP layer bridging aka routing)
So the current modes would be
<forward layer='network' mode='route|nat'/>
And new bridging modes would be
<forward layer='link' mode='bridge-brctl|vepa|private|passthru|bridge-macvtap'/>
For the brctl/macvtap modes, the dev= attribute on<forward> could point to the NIC being used, while with brctl modes,<bridge> would also be present.
Are you saying that in the case of a brctl mode, it would be required to fill in both of these? <forward mode="bridge-brctl" dev="br0" .../> <bridge name="br0" .../> I think I would prefer to only use the one in <forward>. Are you suggesting putting it there to help older management apps cope with the new modes? I don't really think it would help; it's really just an accident of implementation that the device in "bridge-brctl" mode happens to be a bridge device.
In the SRIOV case, we potentiallly need a list of interfaces. For this we probably want to use
BTW, just to clarify, when you say "SRIOV", what you really mean is "any situation where there are multiple network interface devices connected to the same physical network, and identical connectivity to the guest could be provided by any one of these devices". In other words, it doesn't need to be an SRIOV ethernet card with multiple virtual functions, it could also be an older style setup with multiple physical cards, or multiple complete devices on a single card.
<forward dev='eth0'> <interface dev='eth0'/> <interface dev='eth1'/> <interface dev='eth2'/> ... </forward>
NB, the first interface is always to be listed both as a dev= attribute (for compat with existing apps) *and* as a child<interface> element (for apps knowing the new schema).
But since the pool of devices would only ever be used in one of the new forward modes, which an existing app wouldn't understand anyway, would that really buy us anything?
The maxConnect= attribute from your examples above is an interesting thing. I'm not sure whether that is neccessarily a good idea. It feels similar to VMWare's "port group" idea, but I don't think having a simple 'maxConnect=' attribute is sufficient to let us represent the vmware port group idea. I think we might need an more explicit element eg
<portgroup count='5'> <interface dev='eth2'/> </portgroup>
eg, so this associates a port group which allows 5 clients (VM NICs) with the uplink provided by eth2 (which is assumed to be listed under<forward>).
I've thought about this a bit, and I think portgroup is a good idea, but I don't think the name of the device being used fits there. portgroup is a good place to put information about the characteristics of a set of connections, but which device to use is a backend implementation detail, and there isn't necessarily a 1:1 correspondence between the two. portgroup would be used, for example, to configure bandwidth (that's pretty much all VMWare uses it for, plus a blob of "vendor-specific" data), and the guest interface XML would specify which portgroup a guest was going to belong to - if you also set which physical device to use based on portgroup, that would leave the guest XML specifying which physical device to use, which is what we're trying to get away from. (and also it would mean that each physical device would need its own portgroup, which I don't think we want. Thinking more about the maxCount thing, it seems like it might be overkill for now. The case where there must be a limitation of 1 guest per NIC is macvtap passthrough mode, but that's already implied by the fact that it's passthrough. Other than that, libvirt can just attempt to load-balance as best as possible by keeping track of how many connections there are on each device, but not force any artificial limit. We may need to provide some method of reporting the number of connections to any particular network, to be used by a management application for load balancing decisions (although the amount of traffic is probably more important, and that can already be learned). Conclusion on portgroup - a good idea, but not for this, probably for configuration of bandwidth limiting.
So a complete SRIOV example might be
<network> <name>Foo</name> <forward dev='eth0' layer='link' mode='vepa'> <interface dev='eth0'/> <interface dev='eth1'/> <interface dev='eth2'/> ... </forward> <portgroup count='10'> <interface dev='eth0'/> </portgroup> <portgroup count='5'> <interface dev='eth1'/> </portgroup> <portgroup count='5'> <interface dev='eth2'/> </portgroup> </network>
The<virtualport> parameters for VEPA/VNLink could either be stored at the top level under<network>, or inside<portgroup> or both.
Ah, now *there's* something that fits in portgroup (since that's likely exactly what it's used for on the vepa/vnlink capable switch). I think it's reasonable to put it in both places, at the top-level (which would apply to all connections) and in portgroup (which would override the global setting for connections using that portgroup). (I think the bandwidth config could be done in the same way.

Sorry for hijacking the thread for something more general than vnlink: It just occured to me today that the concept of virtual network should be used also in the <graphics> section. Currently, if you specify <graphics listen="some-ip-address"> you make the domain practically unmigratable. There should be a method to specify a simbolic network name, to be evaluated to IP on the destination host, something like <graphics listenNetwork="some-virtual-network"> Regards, Dan.

On Wed, May 11, 2011 at 03:28:30PM +0300, Dan Kenigsberg wrote:
Sorry for hijacking the thread for something more general than vnlink:
It just occured to me today that the concept of virtual network should be used also in the <graphics> section. Currently, if you specify <graphics listen="some-ip-address"> you make the domain practically unmigratable.
There should be a method to specify a simbolic network name, to be evaluated to IP on the destination host, something like
<graphics listenNetwork="some-virtual-network">
Yep, I pretty much agree, but we'll need more help from QEMU todo that. A virtual network may have multiple IP addresses and so we'll need to be able to specify multiple IPs to the VNC server. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Wed, May 11, 2011 at 01:37:25PM +0100, Daniel P. Berrange wrote:
On Wed, May 11, 2011 at 03:28:30PM +0300, Dan Kenigsberg wrote:
Sorry for hijacking the thread for something more general than vnlink:
It just occured to me today that the concept of virtual network should be used also in the <graphics> section. Currently, if you specify <graphics listen="some-ip-address"> you make the domain practically unmigratable.
There should be a method to specify a simbolic network name, to be evaluated to IP on the destination host, something like
<graphics listenNetwork="some-virtual-network">
Yep, I pretty much agree, but we'll need more help from QEMU todo that. A virtual network may have multiple IP addresses and so we'll need to be able to specify multiple IPs to the VNC server.
That would be nice, but I think that a semantics where libvirt is in charge of telling which qemu process is to listen on which of the network IP addresses is as reasonable (and may potentially better, as libvirt has more knowledge of the host) Anyway, if the virtual network is bridge-based, I could live with passing the single IP of the bridge for qemus that do not support this feature. Regards, Dan.

-----Original Message----- From: libvir-list-bounces@redhat.com [mailto:libvir-list- bounces@redhat.com] On Behalf Of Oved Ourfalli Sent: Thursday, April 28, 2011 1:15 AM To: Laine Stump; libvir-list@redhat.com Subject: Re: [libvirt] migration of vnlink VMs
----- Original Message -----
From: "Laine Stump" <lstump@redhat.com> To: "Oved Ourfalli" <ovedo@redhat.com> Cc: "Ayal Baron" <abaron@redhat.com>, "Barak Azulay" <bazulay@redhat.com>, "Shahar Havivi" <shaharh@redhat.com>, "Itamar Heim" <iheim@redhat.com>, "Dan Kenigsberg" <danken@redhat.com> Sent: Thursday, April 28, 2011 10:20:35 AM Subject: Re: migration of vnlink VMs Oved,
Would it be okay to repost this message to the thread on libvir-list so that other parties can add their thoughts?
Of course. I'm sending my answer to the libvirt list.
On 04/27/2011 09:58 AM, Oved Ourfalli wrote:
Laine, hello.
We read your proposal for abstraction of guest<--> host network connection in libvirt.
You has an open issue there regarding the vepa/vnlink attributes: "3) What about the parameters in the<virtualport> element that are currently used by vepa/vnlink. Do those belong with the host, or with the guest?"
The parameters for the virtualport element should be on the guest, and not the host, because a specific interface can run multiple profiles,
Are you talking about host interface or guest interface? If you mean that multiple different profiles can be used when connecting to a particular switch - as long as there are only a few different profiles, rather than each guest having its own unique profile, then it still seems better to have the port profile live with the network definition (and just define multiple networks, one for each port profile).
The profile names can be changed regularly, so it looks like it will be better to put them in the guest level, so that the network host file won't have to be changed on all hosts once something has changed in
Inline... the
profiles.
Also, you will have a duplication of data, writing all the profile name on all the hosts that are connected to the vn-link/vepa switch.
so it will be a mistake to define a profile to be interface specific on the host. Moreover, putting it in the guest level
will
enable us in the future (if supported by libvirt/qemu) to migrate a vm from a host with vepa/vnlink interfaces, to another host with a bridge, for example.
It seems to me like doing exactly the opposite would make it easier to migrate to a host that used a different kind of switching (from vepa to vnlink, or from a bridged interface to vepa, etc), since the port profile required for a particular host's network would be at the host waiting to be used. You are right, but we would want to have the option to prevent that from happening in case we wouldn't want to allow it. We can make the ability to migrate between different network types configurable, and we would like an easy way to tell libvirt - "please allow/don't allow it".
So, in the networks at the host level you will have: <network type='direct'> <name>red-network</name> <source mode='vepa'> <pool> <interface> <name>eth0</name> ..... </interface> <interface> <name>eth4</name> ..... </interface> <interface> <name>eth18</name> ..... </interface> </pool> </source> </network>
And in the guest you will have (for vepa): <interface type='network'> <source network='red-network'/> <virtualport type="802.1Qbg"> <parameters managerid="11" typeid="1193047" typeidversion="2" instanceid="09b11c53-8b5c-4eeb-8f00-d84eaa0aaa4f"/> </virtualport> </interface>
Or (for vnlink): <interface type='network'> <source network='red-network'/> <virtualport type="802.1Qbh"> <parameters profile_name="profile1"/> </virtualport> </interface>
This illustrates the problem I was wondering about - in your example it would not be possible for the guest to migrate from the host using a vepa switch to the host using a vnlink switch (and it would be possible
You are right. When trying to migrate between vepa and vnlink there will be missing attributes in each in case we leave it on the host.
to migrate to a host using a standard bridge only if the virtualport element was ignored). If the virtualport element lived with the network definition of red-network on each host, it could be migrated without problem.
The only problematic thing would be if any of the attributes within <parameters> was unique for each guest (I don't know anything about the individual attributes, but "instanceid" sounds like it might be different for each guest).
Whether a given parameter is unique for each guest (or let's say, whether it can be shared by two or more guests) may also be a config/policy detail for certain parameters. You should take into account that new parameters may be added later on (not only for vepa/vn-link), and could be both unique or shared. For this reason you design should be able to handle such a case. BTW, "instanceid" identifies the virtual nic, therefore it is unique (it is a UUID).
From http://libvirt.org/formatdomain.html#elementsNICS: instanceid The VSI Instance ID Identifier is generated when a VSI instance (i.e. a virtual interface of a virtual machine) is created. This is a globally unique identifier.
Then, when migrating from a vepa/vnlink host to another vepa/vnlink host containing red-network, the profile attributes will be available at the guest domain xml. In case the target host has a red-network, which isn't vepa/vnlink, we want to be able to choose whether to make the use of the
Please note that the IEEE specs (802.1Qbh/1Qbg) have undergone some changes in the last few months. The two protos are going to share the same proto VDP for the port profile configuration. Also, new parameters may be introduced and already existent ones may change type. Later this year, when the IEEE drafts will be closer to final versions, there will be updates to the configuration parameters to reflect the changes in the drafts. profile
attributes optional (i.e., libvirt won't fail in case of migrating to a network of another type), or mandatory (i.e., libvirt will fail in case of migration to a non-vepa/vnlink network).
We have something similar in CPU flags: <cpu match="exact"> <model>qemu64</model> <topology sockets="S" cores="C" threads="T"/> <feature policy="require/optional/disable......" name="sse2"/> </cpu>
In this analogy, does "CPU flags" == "mode (vepa/vnlink/bridge)" or does "CPU flags" == "virtualport parameters" ? It seems like what you're wanting can be satisfied by simply not defining "red-network" on the hosts that don't have the proper networking setup available (maybe what you *really* want to call it is "red-vnlink-network").
What I meant to say in that is that we would like to have the ability to say if an attribute must me used, or not.
"must" in the sense that it is "mandatory"? I can see two cases: 1) src and dst networks are of different types (NOTE: I consider vepa/vnlink different for now, but this will probably change when BH and BG will both use VDP) In this case I do not see why you need to worry about whether parameter param-X used by src host should or should not be used by dst host: it should only if it is a generic parameter and (as such) it does not fall inside the config section that is specific to the network type. Trying to translate parameters between different network types may not be always easy and clean. Even the property "mandatory vs optional" may change with different network types. 2) src and dst networks are of the same type In this case it _does_ make sense to have the possibility of specifying whether a given param is needed or not. However, I believe it would make sense mainly for those parameters that represent optional/desirable features of the proto/net: such config would then be used to decide whether migration will or will not be possible, right? I like the idea of the abstraction, especially the pool of interfaces. However, I think you would have to lose a bit of abstraction in order to make it possible to have migrations between network of different types. I guess your goal is not to make migration possible between each possible combination of network types, is it? There are parameters that are specific to a given network type. How can you expect a migration from network type X to network type Y (Y != X) if you only configure the parameters for type X (and assuming Y comes with at least one mandatory parameter)? What are the combinations of (src/net, dst/net) that you would like to support? Out of curiosity, have you taken into consideration the possibility of defining an abstracted network config as a pool of network types? For example something like this: HOST: Pool of three network sub-types <network type='network'> <name>red-network</name> <source type='direct' mode='vepa'> ... </source> <source type=XXX ...> ... </source> <source type=YYY ...> ... </source> </network> GUEST WHICH ONLY ACCEPTS ONE SPECIFIC TYPE (direct/private): <interface type='network'> <source network='red-network'/> <option prio=1 type='direct' mode='private'> <virtualport type="802.1Qbh"> <parameters profile_name="profile_123"/> </virtualport> </option> </interface> GUEST WHICH ACCEPTS TWO TYPES (direct/private, direct/vepa): <interface type='network'> <source network='red-network'/> <option prio=1 type='direct' mode='private'> <virtualport type="802.1Qbh"> <parameters profile_name="profile_123"/> </virtualport> </option> <option prio=2 type='direct' mode='vepa'> <virtualport type="802.1Qbg"> <parameters managerid="11" typeid="1193047" typeidversion="2" instanceid="09b11c53-8b5c-4eeb-8f00-d84eaa0aaa4f"/> </virtualport> </option> </interface> During migration, the dst host would use the options/prios in the guest config XML to select among the options available for the same 'network' on the dst host, just like any other similar handshake. I am not suggesting this approach, I just would like to know if you ever thought about this option and, if you did, why you discarded it. An example of scenario where the above pool of network types may make sense would be the combination vepa+bridge or vnlink+bridge: if something goes wrong during migration and the vepa/OR/vnlink cannot associate the port profile, the guest can at least have a backup net connection through the bridge. It could optionally also re-try the vepa/vnlink association a number of times (libvirt would do it if configured to do so) ... while maintaining temporary connectivity through the bridge. /Chris
The issues you mention are indeed interesting. I'm cc-ing libvirt-list to see what other people think. Putting it on the guest will indeed make it problematic to migrate between networks that need different parameters (vnlink/vepa for example).
Oved -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
participants (7)
-
Christian Benvenuti (benve)
-
Dan Kenigsberg
-
Daniel P. Berrange
-
Daniel Veillard
-
Jérémie Tarot
-
Laine Stump
-
Oved Ourfalli