[libvirt] [RFC Incomplete Patch] Libvirt + Openvswitch

newer
[libvirt] [PATCH v2 0/1] netlink...

Dan Wendlandt

27 Jan 2012 27 Jan '12

7:58 p.m.

Hello all, I know of many people who want to spin up VMs using libvirt + kvm/qemu and attach those VMs to an openvswitch bridge (see: http://www.openvswitch.org). However, the only way I know of to get this working is a kludge that uses to tap devices along with <interface type="ethernet"> while running ovs-vsctl outside of libvirt. Even worse, doing this on RHEL/Fedora seems to require privilege tweaks (e.g., running qemu as root, not dropping capabilities), which may not be acceptable for production deployments (see: http://fedoraproject.org/wiki/How_to_debug_Virtualization_problems#Errors_us...). So I would like to start taking steps toward better libvirt/openvswitch integration. My initial step has the fairly limit goal of enabling kvm/qemu VM NICs to attach to an openvswitch bridge in much the same way VM NIC can already attached to the linux bridge. For example, specifying: <interface type="openvswitch"> <source bridge="br0"/> <mac address="ca:fe:de;ad:be:ef"/> </interface> I also wanted to add a new child element of <interface> that can be used to specify some OVS specific configuration. To start, I just want to expose a single 'interfaceid' value (this parameter is specific to openvswitch). Extending the previous example: <interface type="openvswitch"> <source bridge="br0"/> <mac address="ca:fe:de;ad:be:ef"/> <openvswitchport interfaceid="interface-xyz"/> </interface> For this first step (see attached patch), I am only targeting the model where the OVS bridge has been created externally ahead of time. I am not tackling any of the "network" logic that actually creates/destroys bridges, as it is not needed for my target use case. A couple notes about the attached patch: - I haven't actually run this code. I was just poking around the libvirt code to learn about it and figured that a diff was the most concrete way to propose what I was thinking of doing. I would be curious for pointers about big chunks of work that I may have missed (for example, I haven't added any tests yet). - the code was modeled on the network interface "bandwidth" code, that calls out to 'tc' to configure bandwidth rates. Ideally we would be able to make direct C calls to OVS (and we plan to make that possible in the future), but calling an external utility right now is the only viable path. - no attention was paid to style guidelines. Will do that before any real submission. - I wasn't very clear on the notion of an "actual" net def, as opposed to a normal net def. What's the best place to look for documentation on this? Anyway, I'm primarily looking for feedback about whether I'm barking up the right tree before I spend time debugging or polishing the patch, so I would appreciate thoughts, advice, etc. Thanks, Dan -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Dan Wendlandt Nicira Networks: www.nicira.com twitter: danwendlandt ~~~~~~~~~~~~~~~~~~~~~~~~~~~

Attachments:

attachment.html (text/html — 3.5 KB)
ovs.diff (text/x-patch — 25.9 KB)

Show replies by date

kmestery

28 Jan 28 Jan

4:22 a.m.

Hi Dan: We at Cisco have been looking at this as well, and in fact were going to propose some things in this area as well. Our proposal (which frankly just builds on top of what you have): The proposal at hand is to add some libvirt configuration as follows: <network> <name>ovs-network</name> <forward mode='ovs' dev='br0'> <script path='/etc/qemu-ifup-ovs'/> </forward> </network> This would setup the OVS network entity, and sets up a forwarding mode of "ovs", which indicates Open vSwitch is used to forward traffic. <interface type='network'> <source network='ovs-network'/> <virtualport type='openvswitch'> <parameters interfaceid="interface-xyz"/> </virtualport> </interface> This model of exposing parameters to virtualport types allows for further expansion as new interface types and parameters are added. Thoughts? Thanks, Kyle On Jan 27, 2012, at 4:58 AM, Dan Wendlandt wrote:

...

Hello all,

I know of many people who want to spin up VMs using libvirt + kvm/qemu and attach those VMs to an openvswitch bridge (see: http://www.openvswitch.org). However, the only way I know of to get this working is a kludge that uses to tap devices along with <interface type="ethernet"> while running ovs-vsctl outside of libvirt. Even worse, doing this on RHEL/Fedora seems to require privilege tweaks (e.g., running qemu as root, not dropping capabilities), which may not be acceptable for production deployments (see: http://fedoraproject.org/wiki/How_to_debug_Virtualization_problems#Errors_us... ).

So I would like to start taking steps toward better libvirt/openvswitch integration. My initial step has the fairly limit goal of enabling kvm/qemu VM NICs to attach to an openvswitch bridge in much the same way VM NIC can already attached to the linux bridge. For example, specifying:

<interface type="openvswitch"> <source bridge="br0"/> <mac address="ca:fe:de;ad:be:ef"/> </interface>

I also wanted to add a new child element of <interface> that can be used to specify some OVS specific configuration. To start, I just want to expose a single 'interfaceid' value (this parameter is specific to openvswitch). Extending the previous example:

<interface type="openvswitch"> <source bridge="br0"/> <mac address="ca:fe:de;ad:be:ef"/> <openvswitchport interfaceid="interface-xyz"/> </interface>

For this first step (see attached patch), I am only targeting the model where the OVS bridge has been created externally ahead of time. I am not tackling any of the "network" logic that actually creates/destroys bridges, as it is not needed for my target use case.

A couple notes about the attached patch: - I haven't actually run this code. I was just poking around the libvirt code to learn about it and figured that a diff was the most concrete way to propose what I was thinking of doing. I would be curious for pointers about big chunks of work that I may have missed (for example, I haven't added any tests yet). - the code was modeled on the network interface "bandwidth" code, that calls out to 'tc' to configure bandwidth rates. Ideally we would be able to make direct C calls to OVS (and we plan to make that possible in the future), but calling an external utility right now is the only viable path. - no attention was paid to style guidelines. Will do that before any real submission. - I wasn't very clear on the notion of an "actual" net def, as opposed to a normal net def. What's the best place to look for documentation on this?

Anyway, I'm primarily looking for feedback about whether I'm barking up the right tree before I spend time debugging or polishing the patch, so I would appreciate thoughts, advice, etc. Thanks,

Dan

Roopa Prabhu

4:45 a.m.

On 1/27/12 11:22 AM, "kmestery" <kmestery@cisco.com> wrote:

...

Hi Dan:

We at Cisco have been looking at this as well, and in fact were going to propose some things in this area as well. Our proposal (which frankly just builds on top of what you have):

The proposal at hand is to add some libvirt configuration as follows:

<network> <name>ovs-network</name> <forward mode='ovs' dev='br0'> <script path='/etc/qemu-ifup-ovs'/> </forward> </network>

This would setup the OVS network entity, and sets up a forwarding mode of "ovs", which indicates Open vSwitch is used to forward traffic.

<interface type='network'> <source network='ovs-network'/> <virtualport type='openvswitch'> <parameters interfaceid="interface-xyz"/> </virtualport> </interface>

This model of exposing parameters to virtualport types allows for further expansion as new interface types and parameters are added.

Thoughts?

The existing forward mode bridge could also work perhaps. If there are no special cases for ovs. <network> <name>ovs-network</name> <forward mode='bridge' dev='ovs-bridge'> <script path='/etc/qemu-ifup-ovs'/> </forward> </network> Or maybe the real syntax is, <network> <name>ovs-network</name> <forward mode='bridge'> <bridge name='ovs-bridge'/> <script path='/etc/qemu-ifup-ovs'/> </forward> </network> Thanks, Roopa

Laine Stump

6:06 a.m.

On 01/27/2012 02:45 PM, Roopa Prabhu wrote:

...

The existing forward mode bridge could also work perhaps. If there are no special cases for ovs.

<network> <name>ovs-network</name> <forward mode='bridge' dev='ovs-bridge'> <script path='/etc/qemu-ifup-ovs'/> </forward> </network>

This one would be confused with macvtap in bridge mode (which is recognized by mode='bridge' and the presence of a dev attribute (or rather, an array of <interface> elements being present, of which by definition the first is copied into the dev attribute for historical reasons).

...

Or maybe the real syntax is,

<network> <name>ovs-network</name> <forward mode='bridge'> <bridge name='ovs-bridge'/> <script path='/etc/qemu-ifup-ovs'/> </forward> </network>

A human might confuse that one with a host bridge network (which is the same, except the <bridge> element isn't inside the <forward> element). I'm leaning towards: <network> <name>ovs-network</name> <forward mode='bridge'/> <bridge name='ovs-bridge' type='openvswitch'/> </network>

Laine Stump

5:55 a.m.

On 01/27/2012 02:22 PM, kmestery wrote:

...

Hi Dan:

We at Cisco have been looking at this as well, and in fact were going to propose some things in this area as well. Our proposal (which frankly just builds on top of what you have):

The proposal at hand is to add some libvirt configuration as follows:

<network> <name>ovs-network</name> <forward mode='ovs' dev='br0'>

Nice! It might be good to settle on a common nomenclature for openvswitch between the network code and interface code. I think I might lean more towards your "openvswitch" than Dan's "ovs", just because it's immediately more obvious to someone who doesn't know what they're looking for. Rather than using the dev attribute, I think it would be more appropriate to use the separate <bridge name='xxx' <bridge name='br0'/> The dev attribute currently has two uses (depending on forward mode) - 1) for virtual networks it indicates which device must be used for all traffic from this network as an egress from the host, and 2) for macvtap modes, it indicates both which interface the guest should connect to, AND which device to use for egress from the host (since both are the same). As far as I understand it, the openvswitch model is more like our bridge mode, where <bridge name='xx'/> is used to indicate which device the guest's tap interfaces should connect to. One advantage of this other than consistency, is that this way a management application that didn't understand the ovs forwarding mode would still properly understand that guests using this network would have their interfaces connected to br0. Alternately, since the openvswitch is presumably equivalent to a bridge, you could follow the method networks that use host bridges and macvtap in "bridge" mode - they both use <forward mode='bridge'>, but are differentiated by whether or not they have a <bridge name='xxx'/> element, or a dev='xx' attribute. Possibly the forward mode would still be bridge, and a separate "<openvswitch name='br0'/> (or something like that) could be used, or maybe mode could remain "bridge", and the <bridge> element could get a new attribute, e.g.: <bridge name='br0' type='openvswitch'/> Actually I think I like this best, because someone looking at the reduced set of attributes would still get all the useful info they needed: 1) it is a bridged connection to the rest of the network (rather than routed) and 2) the device used is br0.

...

<script path='/etc/qemu-ifup-ovs'/>

What do you need the script for? ifup scripts bring up security questions, are currently only permitted with qemu domains for "generic ethernet" interfaces (which are really there just as a way to make *something* work when one of the existing supported types doesn't work), and result in the domain being marked as "tainted". The preferred method is to put whatever functionality is needed directly into libvirt.

...

</forward> </network>

This would setup the OVS network entity, and sets up a forwarding mode of "ovs", which indicates Open vSwitch is used to forward traffic.

<interface type='network'> <source network='ovs-network'/> <virtualport type='openvswitch'> <parameters interfaceid="interface-xyz"/> </virtualport>

I like this re-used of virtualport, rather than defining a new element.

...

</interface>

This model of exposing parameters to virtualport types allows for further expansion as new interface types and parameters are added.

Thoughts?

Uh, "how about some patches?" :-) This all seems to be coming together rather nicely.

kmestery

6:41 a.m.

On Jan 27, 2012, at 2:55 PM, Laine Stump wrote:

...

On 01/27/2012 02:22 PM, kmestery wrote:

...
Hi Dan:

We at Cisco have been looking at this as well, and in fact were going to propose some things in this area as well. Our proposal (which frankly just builds on top of what you have):

The proposal at hand is to add some libvirt configuration as follows:

<network> <name>ovs-network</name> <forward mode='ovs' dev='br0'>

Nice!

It might be good to settle on a common nomenclature for openvswitch between the network code and interface code. I think I might lean more towards your "openvswitch" than Dan's "ovs", just because it's immediately more obvious to someone who doesn't know what they're looking for.

I agree, using openvswitch makes sense and is more obvious.

...

Rather than using the dev attribute, I think it would be more appropriate to use the separate <bridge name='xxx'

<bridge name='br0'/>

The dev attribute currently has two uses (depending on forward mode) - 1) for virtual networks it indicates which device must be used for all traffic from this network as an egress from the host, and 2) for macvtap modes, it indicates both which interface the guest should connect to, AND which device to use for egress from the host (since both are the same).

As far as I understand it, the openvswitch model is more like our bridge mode, where <bridge name='xx'/> is used to indicate which device the guest's tap interfaces should connect to.

One advantage of this other than consistency, is that this way a management application that didn't understand the ovs forwarding mode would still properly understand that guests using this network would have their interfaces connected to br0.

This all makes sense, thanks for the detailed explanation Laine!

...

Alternately, since the openvswitch is presumably equivalent to a bridge, you could follow the method networks that use host bridges and macvtap in "bridge" mode - they both use <forward mode='bridge'>, but are differentiated by whether or not they have a <bridge name='xxx'/> element, or a dev='xx' attribute. Possibly the forward mode would still be bridge, and a separate "<openvswitch name='br0'/> (or something like that) could be used, or maybe mode could remain "bridge", and the <bridge> element could get a new attribute, e.g.:

<bridge name='br0' type='openvswitch'/>

Actually I think I like this best, because someone looking at the reduced set of attributes would still get all the useful info they needed: 1) it is a bridged connection to the rest of the network (rather than routed) and 2) the device used is br0.

This makes sense.

...

...
<script path='/etc/qemu-ifup-ovs'/>

What do you need the script for? ifup scripts bring up security questions, are currently only permitted with qemu domains for "generic ethernet" interfaces (which are really there just as a way to make *something* work when one of the existing supported types doesn't work), and result in the domain being marked as "tainted".

The preferred method is to put whatever functionality is needed directly into libvirt.

Agreed, we can remove this and add the functionality needed into libvirt.

...

...
</forward> </network>

This would setup the OVS network entity, and sets up a forwarding mode of "ovs", which indicates Open vSwitch is used to forward traffic.

<interface type='network'> <source network='ovs-network'/> <virtualport type='openvswitch'> <parameters interfaceid="interface-xyz"/> </virtualport>

I like this re-used of virtualport, rather than defining a new element.

...
</interface>

This model of exposing parameters to virtualport types allows for further expansion as new interface types and parameters are added.

Thoughts?

Uh, "how about some patches?" :-) This all seems to be coming together rather nicely.

Lets see what Dan says. I can rework his patch to accommodate the above if he is ok with it. Thanks! Kyle

Dan Wendlandt

31 Jan 31 Jan

5:41 a.m.

Hi Kyle! Funny how we keep bumping into each other... I hope you're keeping warm in Minnesota :) On Fri, Jan 27, 2012 at 11:22 AM, kmestery <kmestery@cisco.com> wrote:

...

Hi Dan:

We at Cisco have been looking at this as well, and in fact were going to propose some things in this area as well. Our proposal (which frankly just builds on top of what you have):

I agree, I think the two proposals are complementary. Our first goal was to enable the basic mode of plugging an interface into an OVS bridge that was created outside of libvirt. This would require changes to the <interface> XML only, and would mirror how libvirt already let's one plug into an existing bridge using <interface type="bridge">. The second step would be to also allow libvirt to actually create + configure the OVS bridges as well. This I believe would map very closely to the XML you and Roopa have suggested. We would need to put some thought into what of the many configuration options on an OVS bridge need to be exposed as part of the OVS <network> XML (e.g., how to configure an OpenFlow controller, etc.). These are definitely discussions worth having, but I was hoping to start out with a fairly clean initial patch to achieve our first goal. I do like the idea of using the virtual port construct even in the initial <interface> only case. For example: <interface type='bridge'> <bridge name='br0'> <virtualport type="openvswitch"> <parameters interfaceid='xyzzy'/> </virtualport> </interface> This would seem to create a nice parallel with the model you proposed where <virtualport> is used with <interface type="network">. I'm still not sure whether the "type=openvswitch" should be an attribute of the <interface>, <bridge>, or <virtualport> element. Either way, I think the underlying code will be treating OVS + Linux Bridge as two different cases, so I believe this is really just a matter of consistently of presentation in XML. Dan

...

The proposal at hand is to add some libvirt configuration as follows:

<network> <name>ovs-network</name> <forward mode='ovs' dev='br0'> <script path='/etc/qemu-ifup-ovs'/> </forward> </network>

This would setup the OVS network entity, and sets up a forwarding mode of "ovs", which indicates Open vSwitch is used to forward traffic.

<interface type='network'> <source network='ovs-network'/> <virtualport type='openvswitch'> <parameters interfaceid="interface-xyz"/> </virtualport> </interface>

This model of exposing parameters to virtualport types allows for further expansion as new interface types and parameters are added.

Thoughts?

Thanks, Kyle

On Jan 27, 2012, at 4:58 AM, Dan Wendlandt wrote:

...
Hello all,

I know of many people who want to spin up VMs using libvirt + kvm/qemu and attach those VMs to an openvswitch bridge (see: http://www.openvswitch.org). However, the only way I know of to get this working is a kludge that uses to tap devices along with <interface type="ethernet"> while running ovs-vsctl outside of libvirt. Even worse, doing this on RHEL/Fedora seems to require privilege tweaks (e.g., running qemu as root, not dropping capabilities), which may not be acceptable for production deployments (see: http://fedoraproject.org/wiki/How_to_debug_Virtualization_problems#Errors_us... ).

So I would like to start taking steps toward better libvirt/openvswitch integration. My initial step has the fairly limit goal of enabling kvm/qemu VM NICs to attach to an openvswitch bridge in much the same way VM NIC can already attached to the linux bridge. For example, specifying:

<interface type="openvswitch"> <source bridge="br0"/> <mac address="ca:fe:de;ad:be:ef"/> </interface>

I also wanted to add a new child element of <interface> that can be used to specify some OVS specific configuration. To start, I just want to expose a single 'interfaceid' value (this parameter is specific to openvswitch). Extending the previous example:

<interface type="openvswitch"> <source bridge="br0"/> <mac address="ca:fe:de;ad:be:ef"/> <openvswitchport interfaceid="interface-xyz"/> </interface>

For this first step (see attached patch), I am only targeting the model where the OVS bridge has been created externally ahead of time. I am not tackling any of the "network" logic that actually creates/destroys bridges, as it is not needed for my target use case.

A couple notes about the attached patch: - I haven't actually run this code. I was just poking around the libvirt code to learn about it and figured that a diff was the most concrete way to propose what I was thinking of doing. I would be curious for pointers about big chunks of work that I may have missed (for example, I haven't added any tests yet). - the code was modeled on the network interface "bandwidth" code, that calls out to 'tc' to configure bandwidth rates. Ideally we would be able to make direct C calls to OVS (and we plan to make that possible in the future), but calling an external utility right now is the only viable path. - no attention was paid to style guidelines. Will do that before any real submission. - I wasn't very clear on the notion of an "actual" net def, as opposed to a normal net def. What's the best place to look for documentation on this?

Anyway, I'm primarily looking for feedback about whether I'm barking up the right tree before I spend time debugging or polishing the patch, so I would appreciate thoughts, advice, etc. Thanks,

Dan

-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Dan Wendlandt Nicira Networks: www.nicira.com twitter: danwendlandt ~~~~~~~~~~~~~~~~~~~~~~~~~~~

kmestery

5:49 a.m.

On Jan 30, 2012, at 2:41 PM, Dan Wendlandt wrote:

...

Hi Kyle! Funny how we keep bumping into each other... I hope you're keeping warm in Minnesota :)

On Fri, Jan 27, 2012 at 11:22 AM, kmestery <kmestery@cisco.com> wrote:

...
Hi Dan:

We at Cisco have been looking at this as well, and in fact were going to propose some things in this area as well. Our proposal (which frankly just builds on top of what you have):

I agree, I think the two proposals are complementary. Our first goal was to enable the basic mode of plugging an interface into an OVS bridge that was created outside of libvirt. This would require changes to the <interface> XML only, and would mirror how libvirt already let's one plug into an existing bridge using <interface type="bridge">.

This makes sense.

...

The second step would be to also allow libvirt to actually create + configure the OVS bridges as well. This I believe would map very closely to the XML you and Roopa have suggested. We would need to put some thought into what of the many configuration options on an OVS bridge need to be exposed as part of the OVS <network> XML (e.g., how to configure an OpenFlow controller, etc.). These are definitely discussions worth having, but I was hoping to start out with a fairly clean initial patch to achieve our first goal.

OK, this makes sense too.

...

I do like the idea of using the virtual port construct even in the initial <interface> only case. For example:

<interface type='bridge'> <bridge name='br0'> <virtualport type="openvswitch"> <parameters interfaceid='xyzzy'/> </virtualport> </interface>

This would seem to create a nice parallel with the model you proposed where <virtualport> is used with <interface type="network">. I'm still not sure whether the "type=openvswitch" should be an attribute of the <interface>, <bridge>, or <virtualport> element. Either way, I think the underlying code will be treating OVS + Linux Bridge as two different cases, so I believe this is really just a matter of consistently of presentation in XML.

I think fundamentally an Open vSwitch bridge is different from a standard Linux, thus the "type=openvswitch" needs to be a part of the <bridge> for sure. I like adding it to the <virtualport> element above. I'm reworking the patch you sent last week to tie these all together, let me know if you plan to resubmit your patch as well so we can work together and not duplicate effort. Thanks! Kyle

...

Dan

...
The proposal at hand is to add some libvirt configuration as follows:

<network> <name>ovs-network</name> <forward mode='ovs' dev='br0'> <script path='/etc/qemu-ifup-ovs'/> </forward> </network>

This would setup the OVS network entity, and sets up a forwarding mode of "ovs", which indicates Open vSwitch is used to forward traffic.

<interface type='network'> <source network='ovs-network'/> <virtualport type='openvswitch'> <parameters interfaceid="interface-xyz"/> </virtualport> </interface>

This model of exposing parameters to virtualport types allows for further expansion as new interface types and parameters are added.

Thoughts?

Thanks, Kyle

On Jan 27, 2012, at 4:58 AM, Dan Wendlandt wrote:

...
Hello all,

I know of many people who want to spin up VMs using libvirt + kvm/qemu and attach those VMs to an openvswitch bridge (see: http://www.openvswitch.org). However, the only way I know of to get this working is a kludge that uses to tap devices along with <interface type="ethernet"> while running ovs-vsctl outside of libvirt. Even worse, doing this on RHEL/Fedora seems to require privilege tweaks (e.g., running qemu as root, not dropping capabilities), which may not be acceptable for production deployments (see: http://fedoraproject.org/wiki/How_to_debug_Virtualization_problems#Errors_us... ).

So I would like to start taking steps toward better libvirt/openvswitch integration. My initial step has the fairly limit goal of enabling kvm/qemu VM NICs to attach to an openvswitch bridge in much the same way VM NIC can already attached to the linux bridge. For example, specifying:

<interface type="openvswitch"> <source bridge="br0"/> <mac address="ca:fe:de;ad:be:ef"/> </interface>

I also wanted to add a new child element of <interface> that can be used to specify some OVS specific configuration. To start, I just want to expose a single 'interfaceid' value (this parameter is specific to openvswitch). Extending the previous example:

<interface type="openvswitch"> <source bridge="br0"/> <mac address="ca:fe:de;ad:be:ef"/> <openvswitchport interfaceid="interface-xyz"/> </interface>

For this first step (see attached patch), I am only targeting the model where the OVS bridge has been created externally ahead of time. I am not tackling any of the "network" logic that actually creates/destroys bridges, as it is not needed for my target use case.

A couple notes about the attached patch: - I haven't actually run this code. I was just poking around the libvirt code to learn about it and figured that a diff was the most concrete way to propose what I was thinking of doing. I would be curious for pointers about big chunks of work that I may have missed (for example, I haven't added any tests yet). - the code was modeled on the network interface "bandwidth" code, that calls out to 'tc' to configure bandwidth rates. Ideally we would be able to make direct C calls to OVS (and we plan to make that possible in the future), but calling an external utility right now is the only viable path. - no attention was paid to style guidelines. Will do that before any real submission. - I wasn't very clear on the notion of an "actual" net def, as opposed to a normal net def. What's the best place to look for documentation on this?

Anyway, I'm primarily looking for feedback about whether I'm barking up the right tree before I spend time debugging or polishing the patch, so I would appreciate thoughts, advice, etc. Thanks,

Dan

Daniel P. Berrange

9:01 p.m.

On Mon, Jan 30, 2012 at 02:49:03PM -0600, kmestery wrote:

...

On Jan 30, 2012, at 2:41 PM, Dan Wendlandt wrote:

...
Hi Kyle! Funny how we keep bumping into each other... I hope you're keeping warm in Minnesota :)

On Fri, Jan 27, 2012 at 11:22 AM, kmestery <kmestery@cisco.com> wrote:

...
Hi Dan:

We at Cisco have been looking at this as well, and in fact were going to propose some things in this area as well. Our proposal (which frankly just builds on top of what you have):

I agree, I think the two proposals are complementary. Our first goal was to enable the basic mode of plugging an interface into an OVS bridge that was created outside of libvirt. This would require changes to the <interface> XML only, and would mirror how libvirt already let's one plug into an existing bridge using <interface type="bridge">.

This makes sense.

...
The second step would be to also allow libvirt to actually create + configure the OVS bridges as well. This I believe would map very closely to the XML you and Roopa have suggested. We would need to put some thought into what of the many configuration options on an OVS bridge need to be exposed as part of the OVS <network> XML (e.g., how to configure an OpenFlow controller, etc.). These are definitely discussions worth having, but I was hoping to start out with a fairly clean initial patch to achieve our first goal.

OK, this makes sense too.

...
I do like the idea of using the virtual port construct even in the initial <interface> only case. For example:

<interface type='bridge'> <bridge name='br0'> <virtualport type="openvswitch"> <parameters interfaceid='xyzzy'/> </virtualport> </interface>

This would seem to create a nice parallel with the model you proposed where <virtualport> is used with <interface type="network">. I'm still not sure whether the "type=openvswitch" should be an attribute of the <interface>, <bridge>, or <virtualport> element. Either way, I think the underlying code will be treating OVS + Linux Bridge as two different cases, so I believe this is really just a matter of consistently of presentation in XML.

Yes, I prefer this design to the initial proposal.

...

...
I think fundamentally an Open vSwitch bridge is different from a standard Linux, thus the "type=openvswitch" needs to be a part of the <bridge> for sure. I like adding it to the <virtualport> element above.

NB, type='bridge' technically refers to the *concept* of bridging an interface to a LAN, not the implemntation of Linux software bridging. Thus it shouldn't change for OpenVSwitch which is also providing bridging to the LAN here. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Kyle Mestery

1 Feb 1 Feb

4:16 a.m.

On Jan 31, 2012, at 6:01 AM, Daniel P. Berrange wrote:

...

On Mon, Jan 30, 2012 at 02:49:03PM -0600, kmestery wrote:

...
On Jan 30, 2012, at 2:41 PM, Dan Wendlandt wrote:

...
Hi Kyle! Funny how we keep bumping into each other... I hope you're keeping warm in Minnesota :)

On Fri, Jan 27, 2012 at 11:22 AM, kmestery <kmestery@cisco.com> wrote:

...
Hi Dan:

We at Cisco have been looking at this as well, and in fact were going to propose some things in this area as well. Our proposal (which frankly just builds on top of what you have):

I agree, I think the two proposals are complementary. Our first goal was to enable the basic mode of plugging an interface into an OVS bridge that was created outside of libvirt. This would require changes to the <interface> XML only, and would mirror how libvirt already let's one plug into an existing bridge using <interface type="bridge">.

This makes sense.

...
The second step would be to also allow libvirt to actually create + configure the OVS bridges as well. This I believe would map very closely to the XML you and Roopa have suggested. We would need to put some thought into what of the many configuration options on an OVS bridge need to be exposed as part of the OVS <network> XML (e.g., how to configure an OpenFlow controller, etc.). These are definitely discussions worth having, but I was hoping to start out with a fairly clean initial patch to achieve our first goal.

OK, this makes sense too.

...
I do like the idea of using the virtual port construct even in the initial <interface> only case. For example:

<interface type='bridge'> <bridge name='br0'> <virtualport type="openvswitch"> <parameters interfaceid='xyzzy'/> </virtualport> </interface>

This would seem to create a nice parallel with the model you proposed where <virtualport> is used with <interface type="network">. I'm still not sure whether the "type=openvswitch" should be an attribute of the <interface>, <bridge>, or <virtualport> element. Either way, I think the underlying code will be treating OVS + Linux Bridge as two different cases, so I believe this is really just a matter of consistently of presentation in XML.

Yes, I prefer this design to the initial proposal.

...
...
I think fundamentally an Open vSwitch bridge is different from a standard Linux, thus the "type=openvswitch" needs to be a part of the <bridge> for sure. I like adding it to the <virtualport> element above.

NB, type='bridge' technically refers to the *concept* of bridging an interface to a LAN, not the implemntation of Linux software bridging. Thus it shouldn't change for OpenVSwitch which is also providing bridging to the LAN here.

Regards, Daniel

Dan and I working together to modify the patches in this direction. We'll send them out for review once we have them formatted and tested. Thanks, Kyle

Dan Wendlandt

2 Feb 2 Feb

3:59 a.m.

On Tue, Jan 31, 2012 at 4:01 AM, Daniel P. Berrange <berrange@redhat.com> wrote:

...

...
...
I do like the idea of using the virtual port construct even in the initial <interface> only case. For example:

<interface type='bridge'> <bridge name='br0'> <virtualport type="openvswitch"> <parameters interfaceid='xyzzy'/> </virtualport> </interface>

...

NB, type='bridge' technically refers to the *concept* of bridging an interface to a LAN, not the implemntation of Linux software bridging. Thus it shouldn't change for OpenVSwitch which is also providing bridging to the LAN here.

That makes sense. We'll stick with the example above. Dan

...

Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Dan Wendlandt Nicira Networks: www.nicira.com twitter: danwendlandt ~~~~~~~~~~~~~~~~~~~~~~~~~~~

Laine Stump

28 Jan 28 Jan

5:42 a.m.

On 01/27/2012 05:58 AM, Dan Wendlandt wrote:

...

Hello all,

I know of many people who want to spin up VMs using libvirt + kvm/qemu and attach those VMs to an openvswitch bridge (see: http://www.openvswitch.org). However, the only way I know of to get this working is a kludge that uses to tap devices along with <interface type="ethernet"> while running ovs-vsctl outside of libvirt. Even worse, doing this on RHEL/Fedora seems to require privilege tweaks (e.g., running qemu as root, not dropping capabilities), which may not be acceptable for production deployments (see: http://fedoraproject.org/wiki/How_to_debug_Virtualization_problems#Errors_us... ).

So I would like to start taking steps toward better libvirt/openvswitch integration. My initial step has the fairly limit goal of enabling kvm/qemu VM NICs to attach to an openvswitch bridge in much the same way VM NIC can already attached to the linux bridge. For example, specifying:

<interface type="openvswitch"> <source bridge="br0"/> <mac address="ca:fe:de;ad:be:ef"/> </interface>

I also wanted to add a new child element of <interface> that can be used to specify some OVS specific configuration. To start, I just want to expose a single 'interfaceid' value (this parameter is specific to openvswitch). Extending the previous example:

<interface type="openvswitch"> <source bridge="br0"/> <mac address="ca:fe:de;ad:be:ef"/> <openvswitchport interfaceid="interface-xyz"/> </interface>

This is a good start! It might be better to use the same virtualport element that is used for the various macvtap flavors, with a different type attribute, which would be used to determine which of attributes in the <parameters> sub-element to use - more or less how kmestery suggested in his followup proposing XML for network configuration. A consistency like this tends to result in less new code / data structures. (btw, you say that "interfaceid" is specific to openvswitch, but that's a generic enough name that it might be useful for someone else. Or maybe you could even re-use something from <virtualport>, maybe instanceid, which is also unique per interface)

...

For this first step (see attached patch), I am only targeting the model where the OVS bridge has been created externally ahead of time. I am not tackling any of the "network" logic that actually creates/destroys bridges, as it is not needed for my target use case.

A couple notes about the attached patch: - I haven't actually run this code. I was just poking around the libvirt code to learn about it and figured that a diff was the most concrete way to propose what I was thinking of doing. I would be curious for pointers about big chunks of work that I may have missed (for example, I haven't added any tests yet). - the code was modeled on the network interface "bandwidth" code, that calls out to 'tc' to configure bandwidth rates. Ideally we would be able to make direct C calls to OVS (and we plan to make that possible in the future), but calling an external utility right now is the only viable path. - no attention was paid to style guidelines. Will do that before any real submission. - I wasn't very clear on the notion of an "actual" net def, as opposed to a normal net def. What's the best place to look for documentation on this?

The ActualNetDef was invented as a way to retain the original config of the interface at runtime, while also providing the details of what kind of interface setup that config resulted in. So the NetDef might say the interface type is "network", but the ActualNetDef would reflect that the interface ended up using eth25 in direct (macvtap) private mode, for example. The <actual> element is only used internally to keep track of what resources are in use by a domain even in the case of libvirtd restarting (this is stored in xml files in libvirt's "statusdir"), and are never seen in the dumpxml returned by the libvirt public API, so you won't find documentation for them in the public API docs. In your case, until there is an openvswitch *network* type, the ActualNetDef won't be used for openvswitch interfaces. When there is an openvswitch network type, if an interface is of type='network', then the qemu domain startup code will call networkAllocateActualDevice(), which will look up the network, determine its type, allocate the necessary resources, and fill in the ActualNetDef to return to qemu. So, any resources that are specific to a particular type of interface, or could be filled in from a portgroup entry (e.g. bandwidth, virtualport) should also exist in the ActualNetDef, and a helper function made that returns the value from the ActualNetDef if it's there, or from the NetDef otherwise. For example, a network that is forward mode='passthrough' (indicating macvtap passthrough mode) may have a pool of devices for use by domains. Each domain's config only says type='network' network name='mynet', but at connect time, networkAllocateActualDevice is called, which picks an unused device and fills that into a new ActualNetDef (along with any common virtualport/bandwidth info from an appropriate portgroup), and returns that to the qemu driver. Now the qemu driver has the info that it will use interface "ethxx" in macvtap passthrough mode (by looking at the ActualNetDef), but the original config remains unspoiled.

...

Anyway, I'm primarily looking for feedback about whether I'm barking up the right tree before I spend time debugging or polishing the patch, so I would appreciate thoughts, advice, etc. Thanks,

Dan

-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Dan Wendlandt Nicira Networks: www.nicira.com <http://www.nicira.com> twitter: danwendlandt ~~~~~~~~~~~~~~~~~~~~~~~~~~~

ovs.diff

diff --git a/configure.ac b/configure.ac index 6709ebf..709128d 100644 --- a/configure.ac +++ b/configure.ac @@ -213,6 +213,8 @@ AC_PATH_PROG([UDEVSETTLE], [udevsettle], [], [/sbin:/usr/sbin:/usr/local/sbin:$PATH]) AC_PATH_PROG([MODPROBE], [modprobe], [], [/sbin:/usr/sbin:/usr/local/sbin:$PATH]) +AC_PATH_PROG([OVSVSCTL], [ovs-vsctl], [ovs-vsctl], + [/sbin:/usr/sbin:/usr/local/sbin:$PATH])

AC_DEFINE_UNQUOTED([DNSMASQ],["$DNSMASQ"], [Location or name of the dnsmasq program]) @@ -220,6 +222,9 @@ AC_DEFINE_UNQUOTED([RADVD],["$RADVD"], [Location or name of the radvd program]) AC_DEFINE_UNQUOTED([TC],["$TC"], [Location or name of the tc profram (see iproute2)]) +AC_DEFINE_UNQUOTED([OVSVSCTL],["$OVSVSCTL"], + [Location or name of the ovs-vsctl program]) + if test -n "$UDEVADM"; then AC_DEFINE_UNQUOTED([UDEVADM],["$UDEVADM"], [Location or name of the udevadm program]) diff --git a/src/Makefile.am b/src/Makefile.am index a44446f..679a060 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -94,6 +94,7 @@ UTIL_SOURCES = \ util/virkeymaps.h \ util/virnetdev.h util/virnetdev.c \ util/virnetdevbandwidth.h util/virnetdevbandwidth.c \ + util/virnetdevopenvswitch.h util/virnetdevopenvswitch.c \ util/virnetdevbridge.h util/virnetdevbridge.c \ util/virnetdevmacvlan.c util/virnetdevmacvlan.h \ util/virnetdevtap.h util/virnetdevtap.c \ @@ -133,6 +134,7 @@ LOCK_DRIVER_SANLOCK_SOURCES = \

NETDEV_CONF_SOURCES = \ conf/netdev_bandwidth_conf.h conf/netdev_bandwidth_conf.c \ + conf/netdev_openvswitch_conf.h conf/netdev_openvswitch_conf.c \ conf/netdev_vport_profile_conf.h conf/netdev_vport_profile_conf.c

# XML configuration format handling sources diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index e872396..eafcd9a 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -50,6 +50,7 @@ #include "count-one-bits.h" #include "secret_conf.h" #include "netdev_vport_profile_conf.h" +#include "netdev_openvswitch_conf.h" #include "netdev_bandwidth_conf.h"

#define VIR_FROM_THIS VIR_FROM_DOMAIN @@ -279,6 +280,7 @@ VIR_ENUM_IMPL(virDomainNet, VIR_DOMAIN_NET_TYPE_LAST, "network", "bridge", "internal", + "openvswitch", "direct")

VIR_ENUM_IMPL(virDomainNetBackend, VIR_DOMAIN_NET_BACKEND_TYPE_LAST, @@ -945,6 +947,10 @@ virDomainActualNetDefFree(virDomainActualNetDefPtr def) case VIR_DOMAIN_NET_TYPE_BRIDGE: VIR_FREE(def->data.bridge.brname); break; + case VIR_DOMAIN_NET_TYPE_OPENVSWITCH: + VIR_FREE(def->data.ovs.ovsPort->brname); + VIR_FREE(def->data.ovs.ovsPort->InterfaceID); + VIR_FREE(def->data.ovs.ovsPort); case VIR_DOMAIN_NET_TYPE_DIRECT: VIR_FREE(def->data.direct.linkdev); VIR_FREE(def->data.direct.virtPortProfile); @@ -998,6 +1004,12 @@ void virDomainNetDefFree(virDomainNetDefPtr def) VIR_FREE(def->data.direct.virtPortProfile); break;

+ case VIR_DOMAIN_NET_TYPE_OPENVSWITCH: + VIR_FREE(def->data.ovs.ovsPort->brname); + VIR_FREE(def->data.ovs.ovsPort->InterfaceID); + VIR_FREE(def->data.ovs.ovsPort); + break; + case VIR_DOMAIN_NET_TYPE_USER: case VIR_DOMAIN_NET_TYPE_LAST: break; @@ -3606,6 +3618,7 @@ virDomainActualNetDefParseXML(xmlNodePtr node, goto error; } if (actual->type != VIR_DOMAIN_NET_TYPE_BRIDGE&& + actual->type != VIR_DOMAIN_NET_TYPE_OPENVSWITCH&& actual->type != VIR_DOMAIN_NET_TYPE_DIRECT&& actual->type != VIR_DOMAIN_NET_TYPE_NETWORK) { virDomainReportError(VIR_ERR_INTERNAL_ERROR, @@ -3616,6 +3629,14 @@ virDomainActualNetDefParseXML(xmlNodePtr node,

if (actual->type == VIR_DOMAIN_NET_TYPE_BRIDGE) { actual->data.bridge.brname = virXPathString("string(./source[1]/@bridge)", ctxt); + } else if (actual->type == VIR_DOMAIN_NET_TYPE_OPENVSWITCH) { + actual->data.bridge.brname = virXPathString("string(./source[1]/@bridge)", ctxt); + + xmlNodePtr ovsPortNode = virXPathNode("./openvswitchport", ctxt); + if (ovsPortNode&& + (!(actual->data.ovs.ovsPort = + virNetDevOpenvswitchPortParse(ovsPortNode)))) + goto error; } else if (actual->type == VIR_DOMAIN_NET_TYPE_DIRECT) { xmlNodePtr virtPortNode;

@@ -3695,6 +3716,7 @@ virDomainNetDefParseXML(virCapsPtr caps, char *linkstate = NULL; virNWFilterHashTablePtr filterparams = NULL; virNetDevVPortProfilePtr virtPort = NULL; + virNetDevOpenvswitchPortPtr ovsPort = NULL; virDomainActualNetDefPtr actual = NULL; xmlNodePtr oldnode = ctxt->node; int ret; @@ -3736,6 +3758,10 @@ virDomainNetDefParseXML(virCapsPtr caps, (def->type == VIR_DOMAIN_NET_TYPE_BRIDGE)&& (xmlStrEqual(cur->name, BAD_CAST "source"))) { bridge = virXMLPropString(cur, "bridge"); + } else if ((network == NULL)&& + (def->type == VIR_DOMAIN_NET_TYPE_OPENVSWITCH)&& + (xmlStrEqual(cur->name, BAD_CAST "source"))) { + bridge = virXMLPropString(cur, "bridge");

you could just add an extra clause to the previous conditional. But that's a detail for later...

...

} else if ((dev == NULL)&& (def->type == VIR_DOMAIN_NET_TYPE_ETHERNET || def->type == VIR_DOMAIN_NET_TYPE_DIRECT)&& @@ -3748,6 +3774,11 @@ virDomainNetDefParseXML(virCapsPtr caps, xmlStrEqual(cur->name, BAD_CAST "virtualport")) { if (!(virtPort = virNetDevVPortProfileParse(cur))) goto error; + } else if ((ovsPort == NULL)&& + ((def->type == VIR_DOMAIN_NET_TYPE_OPENVSWITCH)&& + xmlStrEqual(cur->name, BAD_CAST "openvswitchport"))) { + if (!(ovsPort = virNetDevOpenvswitchPortParse(cur))) + goto error;

Again, I think I prefer the idea of re-using <virtualport> instead of creating a new element type.

...

} else if ((network == NULL)&& ((def->type == VIR_DOMAIN_NET_TYPE_SERVER) || (def->type == VIR_DOMAIN_NET_TYPE_CLIENT) || @@ -3885,6 +3916,14 @@ virDomainNetDefParseXML(virCapsPtr caps, } break;

+ case VIR_DOMAIN_NET_TYPE_OPENVSWITCH: + if (bridge == NULL) { + virDomainReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("No<source> 'bridge' attribute specified with<interface type='openvswitch'/>")); + goto error; + } + break; + case VIR_DOMAIN_NET_TYPE_CLIENT: case VIR_DOMAIN_NET_TYPE_SERVER: case VIR_DOMAIN_NET_TYPE_MCAST: @@ -10279,6 +10318,7 @@ virDomainActualNetDefFormat(virBufferPtr buf, }

if (def->type != VIR_DOMAIN_NET_TYPE_BRIDGE&& + def->type != VIR_DOMAIN_NET_TYPE_OPENVSWITCH&& def->type != VIR_DOMAIN_NET_TYPE_DIRECT&& def->type != VIR_DOMAIN_NET_TYPE_NETWORK) { virDomainReportError(VIR_ERR_INTERNAL_ERROR, @@ -10312,6 +10352,14 @@ virDomainActualNetDefFormat(virBufferPtr buf, goto error; virBufferAdjustIndent(buf, -8); break; + case VIR_DOMAIN_NET_TYPE_OPENVSWITCH: + virBufferEscapeString(buf, "<source bridge='%s'/>\n", + def->data.ovs.ovsPort->brname); + virBufferAdjustIndent(buf, 6); + if (virNetDevOpenvswitchPortFormat(def->data.ovs.ovsPort, buf)< 0) + return -1; + virBufferAdjustIndent(buf, -6); + break; default: break; } @@ -10408,6 +10456,14 @@ virDomainNetDefFormat(virBufferPtr buf, virBufferAdjustIndent(buf, -6); break;

+ case VIR_DOMAIN_NET_TYPE_OPENVSWITCH: + virBufferEscapeString(buf, "<source bridge='%s'/>\n", + def->data.ovs.ovsPort->brname); + virBufferAdjustIndent(buf, 6); + if (virNetDevOpenvswitchPortFormat(def->data.ovs.ovsPort, buf)< 0) + return -1; + virBufferAdjustIndent(buf, -6); + break; case VIR_DOMAIN_NET_TYPE_USER: case VIR_DOMAIN_NET_TYPE_LAST: break; @@ -13729,6 +13785,18 @@ virDomainNetGetActualBandwidth(virDomainNetDefPtr iface) return iface->bandwidth; }

+virNetDevOpenvswitchPortPtr +virDomainNetGetActualOpenvswitchPortPtr(virDomainNetDefPtr iface) +{ + if (iface->type == VIR_DOMAIN_NET_TYPE_OPENVSWITCH) + return iface->data.ovs.ovsPort; + if (iface->type != VIR_DOMAIN_NET_TYPE_NETWORK) + return NULL; + if (!iface->data.network.actual) + return NULL; + return iface->data.network.actual->data.ovs.ovsPort; +} +

Right. Even though you claim you didn't understand it, and you wouldn't *need* it right away, you've done a reasonable job of filling out the actualnetdef for this new type :-)

...

/* Return listens[ii] from the appropriate union for the graphics * type, or NULL if this is an unsuitable type, or the index is out of diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 3b522a9..5d0cfcd 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -41,6 +41,7 @@ # include "virnetdevmacvlan.h" # include "sysinfo.h" # include "virnetdevvportprofile.h" +# include "virnetdevopenvswitch.h" # include "virnetdevbandwidth.h"

/* Different types of hypervisor */ @@ -525,6 +526,7 @@ enum virDomainNetType { VIR_DOMAIN_NET_TYPE_BRIDGE, VIR_DOMAIN_NET_TYPE_INTERNAL, VIR_DOMAIN_NET_TYPE_DIRECT, + VIR_DOMAIN_NET_TYPE_OPENVSWITCH,

VIR_DOMAIN_NET_TYPE_LAST, }; @@ -574,6 +576,9 @@ struct _virDomainActualNetDef { int mode; /* enum virMacvtapMode from util/macvtap.h */ virNetDevVPortProfilePtr virtPortProfile; } direct; + struct { + virNetDevOpenvswitchPortPtr ovsPort;

If you know that the ovsPort will always be valid any time type='openvswitch', it would make memory management/error recovery simpler to just make this virNetDevOpenvswithcPort, rather than a pointer to one. Normally a pointer to a struct is used if something is optional (so NULL will indicate that it was missing from the config), or if someone just has fun doing malloc/free calisthenics ;-)

...

+ } ovs; } data; virNetDevBandwidthPtr bandwidth; }; @@ -628,6 +633,9 @@ struct _virDomainNetDef { int mode; /* enum virMacvtapMode from util/macvtap.h */ virNetDevVPortProfilePtr virtPortProfile; } direct; + struct { + virNetDevOpenvswitchPortPtr ovsPort;

Same here.

...

+ } ovs; } data; struct { bool sndbuf_specified; @@ -1860,6 +1868,9 @@ virDomainNetGetActualDirectVirtPortProfile(virDomainNetDefPtr iface); virNetDevBandwidthPtr virDomainNetGetActualBandwidth(virDomainNetDefPtr iface);

+virNetDevOpenvswitchPortPtr +virDomainNetGetActualOpenvswitchPortPtr(virDomainNetDefPtr iface); + int virDomainControllerInsert(virDomainDefPtr def, virDomainControllerDefPtr controller); void virDomainControllerInsertPreAlloced(virDomainDefPtr def, diff --git a/src/conf/netdev_openvswitch_conf.c b/src/conf/netdev_openvswitch_conf.c new file mode 100644 index 0000000..f83b8bf --- /dev/null +++ b/src/conf/netdev_openvswitch_conf.c @@ -0,0 +1,75 @@ +/* + * Copyright (C) 2012 Nicira, Inc. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * Authors: + * Dan Wendlandt<dan@nicira.com> + */ + +#include<config.h> + +#include "netdev_openvswitch_conf.h" +#include "virterror_internal.h" +#include "memory.h" + +#define VIR_FROM_THIS VIR_FROM_NONE +#define virNetDevError(code, ...) \ + virReportErrorHelper(VIR_FROM_THIS, code, __FILE__, \ + __FUNCTION__, __LINE__, __VA_ARGS__) + + +virNetDevOpenvswitchPortPtr +virNetDevOpenvswitchPortParse(xmlNodePtr node) +{ + virNetDevOpenvswitchPortPtr ovsPort = NULL; + xmlNodePtr cur = node->children; + + if (VIR_ALLOC(ovsPort)< 0) { + virReportOOMError(); + return NULL; + } + + ovsPort->InterfaceID = virXMLPropString(node, "interfaceid"); + + if (!ovsPort->InterfaceID || strlen(ovsPort->InterfaceID) == 0) { + // interfaceID does not have to be a UUID, + // but a UUID is a reasonable default + virAlloc(ovsPort->InterfaceID, VIR_UUID_STRING_BUFLEN); + if (virUUIDGenerate(ovsPort->InterfaceID)) { + virNetDevError(VIR_ERR_XML_ERROR, "%s", + _("cannot generate a random uuid for interfaceid")); + goto error; + } + } + +cleanup: + return ovsPort; + +error: + VIR_FREE(ovsPort); + goto cleanup; +} + + +int +virNetDevOpenvswitchPortFormat(virNetDevOpenvswitchPortPtr ovsPort, + virBufferPtr buf) +{ + if (ovsPort) + virBufferAsprintf(buf, "<openvswitchport interfaceid='%s'/>\n", + ovsPort->InterfaceID); + return 0; +} diff --git a/src/conf/netdev_openvswitch_conf.h b/src/conf/netdev_openvswitch_conf.h new file mode 100644 index 0000000..d0d077b --- /dev/null +++ b/src/conf/netdev_openvswitch_conf.h @@ -0,0 +1,38 @@ +/* + * Copyright (C) 2012 Nicira, Inc. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * Authors: + * Dan Wendlandt<dan@nicira.com> + */ + +#ifndef __VIR_NETDEV_OPENVSWITCH_CONF_H__ +# define __VIR_NETDEV_OPENVSWITCH_CONF_H__ + +# include "internal.h" +# include "virnetdevopenvswitch.h" +# include "buf.h" +# include "xml.h" + +virNetDevOpenvswitchPortPtr +virNetDevOpenvswitchPortParse(xmlNodePtr node); + +int +virNetDevOpenvswitchPortFormat(virNetDevOpenvswitchPortPtr ovsPort, + virBufferPtr buf); + + +#endif /* __VIR_NETDEV_OPENVSWITCH_CONF_H__ */

For something this short, it may not be worth having an extra file (requiring the extra export symbols, etc) - it could just be added to domain_conf.h (of course someone else may be planning to split up domain_conf.h as we speak, so...) Of course, if you follow the advice of re-using virtualport for the interfaceID, then you won't need this anyway.

...

diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 04ae35c..1aa78a7 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -378,6 +378,7 @@ virDomainNetGetActualBridgeName; virDomainNetGetActualDirectDev; virDomainNetGetActualDirectMode; virDomainNetGetActualDirectVirtPortProfile; +virDomainNetGetActualOpenvswitchPortPtr; virDomainNetGetActualType; virDomainNetIndexByMac; virDomainNetInsert; @@ -741,6 +742,10 @@ nlComm; virNetDevBandwidthFormat; virNetDevBandwidthParse;

+# netdev_openvswitch_conf.h +virNetDevOpenvswitchPortFormat; +virNetDevOpenvswitchPortParse; +

# netdev_vportprofile_conf.h virNetDevVPortProfileFormat; @@ -1238,6 +1243,10 @@ virNetDevMacVLanCreateWithVPortProfile; virNetDevMacVLanDeleteWithVPortProfile;

+# virnetdevopenvswitch.h +virNetDevOpenvswitchAddPort; + + # virnetdevtap.h virNetDevTapCreate; virNetDevTapCreateInBridgePort; diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c index 5d0d528..d4989c8 100644 --- a/src/network/bridge_driver.c +++ b/src/network/bridge_driver.c @@ -1766,7 +1766,7 @@ networkStartNetworkVirtual(struct network_driver *driver, goto err0; } if (virNetDevTapCreateInBridgePort(network->def->bridge, -&macTapIfName, network->def->mac, 0, false, NULL)< 0) { +&macTapIfName, network->def->mac, 0, false, NULL, NULL)< 0) { VIR_FREE(macTapIfName); goto err0; } diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index aaccf62..6f411c8 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -221,6 +221,13 @@ qemuNetworkIfaceConnect(virDomainDefPtr def, virReportOOMError(); return -1; } + } else if (actualType == VIR_DOMAIN_NET_TYPE_OPENVSWITCH) { + virNetDevOpenvswitchPortPtr p = + virDomainNetGetActualOpenvswitchPortPtr(net); + if (!(brname = strdup(p->brname))) { + virReportOOMError(); + return -1; + } } else { qemuReportError(VIR_ERR_INTERNAL_ERROR, _("Network type %d is not supported"), @@ -247,7 +254,7 @@ qemuNetworkIfaceConnect(virDomainDefPtr def, memcpy(tapmac, net->mac, VIR_MAC_BUFLEN); tapmac[0] = 0xFE; /* Discourage bridge from using TAP dev MAC */ err = virNetDevTapCreateInBridgePort(brname,&net->ifname, tapmac, - vnet_hdr, true,&tapfd); + vnet_hdr, true,&tapfd, net); virDomainAuditNetDevice(def, net, "/dev/net/tun", tapfd>= 0); if (err< 0) { if (template_ifname) @@ -2539,6 +2546,7 @@ qemuBuildHostNetStr(virDomainNetDefPtr net, switch (netType) { case VIR_DOMAIN_NET_TYPE_NETWORK: case VIR_DOMAIN_NET_TYPE_BRIDGE: + case VIR_DOMAIN_NET_TYPE_OPENVSWITCH: case VIR_DOMAIN_NET_TYPE_DIRECT: virBufferAddLit(&buf, "tap"); virBufferAsprintf(&buf, "%cfd=%s", type_sep, tapfd); @@ -2587,6 +2595,7 @@ qemuBuildHostNetStr(virDomainNetDefPtr net, case VIR_DOMAIN_NET_TYPE_ETHERNET: case VIR_DOMAIN_NET_TYPE_NETWORK: case VIR_DOMAIN_NET_TYPE_BRIDGE: + case VIR_DOMAIN_NET_TYPE_OPENVSWITCH: case VIR_DOMAIN_NET_TYPE_INTERNAL: case VIR_DOMAIN_NET_TYPE_DIRECT: case VIR_DOMAIN_NET_TYPE_LAST: diff --git a/src/uml/uml_conf.c b/src/uml/uml_conf.c index 316d558..4ad2d90 100644 --- a/src/uml/uml_conf.c +++ b/src/uml/uml_conf.c @@ -142,7 +142,7 @@ umlConnectTapDevice(virConnectPtr conn, memcpy(tapmac, net->mac, VIR_MAC_BUFLEN); tapmac[0] = 0xFE; /* Discourage bridge from using TAP dev MAC */ if (virNetDevTapCreateInBridgePort(bridge,&net->ifname, tapmac, - 0, true, NULL)< 0) { + 0, true, NULL, net)< 0) { if (template_ifname) VIR_FREE(net->ifname); goto error; @@ -238,6 +238,7 @@ umlBuildCommandLineNet(virConnectPtr conn, }

case VIR_DOMAIN_NET_TYPE_BRIDGE: + case VIR_DOMAIN_NET_TYPE_OPENVSWITCH: if (umlConnectTapDevice(conn, vm, def, def->data.bridge.brname)< 0) goto error; diff --git a/src/util/virnetdevopenvswitch.c b/src/util/virnetdevopenvswitch.c new file mode 100644 index 0000000..0c96116 --- /dev/null +++ b/src/util/virnetdevopenvswitch.c @@ -0,0 +1,79 @@ +/* + * Copyright (C) 2012 Nicira, Inc. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * Authors: + * Dan Wendlandt<dan@nicira..com> + */ + +#include<config.h> + +#include "virnetdevopenvswitch.h" +#include "command.h" +#include "memory.h" +#include "virterror_internal.h" +#include "ignore-value.h" + +#define VIR_FROM_THIS VIR_FROM_NONE + +/** + * virNetDevOpenvswitchAddPort: + * @brname: the bridge name + * @ifname: the network interface name + * @macaddr: the mac address of the virtual interface + * + * Adds an interface to an OVS bridge + * + * Returns 0 in case of success or an errno code in case of failure. + */

Actually you're returning -1 in case of failure. If you were going to return errno, you should instead return -errno, which is a standard that we're trying to adopt in all the libvirt code.

...

+int virNetDevOpenvswitchAddPort(const char *brname, const char *ifname, + const unsigned char *macaddr, + virNetDevOpenvswitchPortPtr ovs_port) +{ + int ret = -1; + virCommandPtr cmd = NULL; + char *attachedmac_ex_id = NULL; + char *ifaceid_ex_id = NULL; + + if (virAsprintf(&attachedmac_ex_id, "external-ids:attached-mac=%s", macaddr)< 0) + goto cleanup; + if (virAsprintf(&ifaceid_ex_id, "external-ids:iface-id=%s", ovs_port->InterfaceID)< 0) + goto cleanup; + + cmd = virCommandNew(OVSVSCTL); + virCommandAddArgList(cmd, "--", "--may-exist", "add-port", + brname, ifname, + "--", "set", "Interface", ifname, attachedmac_ex_id, + "--", "set", "Interface", ifname, ifaceid_ex_id, + "--", "set", "Interface", ifname, + "external-ids:iface-status=active", + NULL); + + if (virCommandRun(cmd, NULL)< 0) + { + virReportSystemError(VIR_ERR_INTERNAL_ERROR, + _("Unable to add port %s to OVS bridge %s"), ifname, brname);

You'll need to use a different log function - virReportSystemError expects its first arg to be a valid errno, but virCommandRun returns -1 on failure, and errno has already been reset by now.

...

+ goto cleanup; + } + ret = 0; + + cleanup: + VIR_FREE(attachedmac_ex_id); + VIR_FREE(ifaceid_ex_id); + virCommandFree(cmd); + return ret; +} + diff --git a/src/util/virnetdevopenvswitch.h b/src/util/virnetdevopenvswitch.h new file mode 100644 index 0000000..bc6ecb3 --- /dev/null +++ b/src/util/virnetdevopenvswitch.h @@ -0,0 +1,41 @@ +/* + * Copyright (C) 2012 Nicira, Inc. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * Authors: + * Dan Wendlandt<dan@nicira..com> + */ + +#ifndef __VIR_NETDEV_OPENVSWITCH_H__ +# define __VIR_NETDEV_OPENVSWITCH_H__ + +# include "internal.h" + +typedef struct _virNetDevOpenvswitchPort virNetDevOpenvswitchPort; +typedef virNetDevOpenvswitchPort *virNetDevOpenvswitchPortPtr; +struct _virNetDevOpenvswitchPort { + char *brname; + char *InterfaceID; +}; + +int virNetDevOpenvswitchAddPort(const char *brname, const char *ifname, + const unsigned char *macaddr, + virNetDevOpenvswitchPortPtr ovs_port) + + ATTRIBUTE_NONNULL(1) ATTRIBUTE_RETURN_CHECK; + + +#endif /* __VIR_NETDEV_OPENVSWITCH_H__ */ diff --git a/src/util/virnetdevtap.c b/src/util/virnetdevtap.c index 2ed53c6..3368051 100644 --- a/src/util/virnetdevtap.c +++ b/src/util/virnetdevtap.c @@ -25,6 +25,7 @@ #include "virnetdevtap.h" #include "virnetdev.h" #include "virnetdevbridge.h" +#include "virnetdevopenvswitch.h" #include "virterror_internal.h" #include "virfile.h" #include "virterror_internal.h" @@ -249,6 +250,7 @@ int virNetDevTapDelete(const char *ifname ATTRIBUTE_UNUSED) * @macaddr: desired MAC address (VIR_MAC_BUFLEN long) * @vnet_hdr: whether to try enabling IFF_VNET_HDR * @tapfd: file descriptor return value for the new tap device + * @net: the net descriptor * * This function creates a new tap device on a bridge. @ifname can be either * a fixed name or a name template with '%d' for dynamic name allocation. @@ -265,8 +267,16 @@ int virNetDevTapCreateInBridgePort(const char *brname, const unsigned char *macaddr, int vnet_hdr, bool up, - int *tapfd) + int *tapfd, + virDomainNetDefPtr net)

I don't think I like putting something as high level as a NetDef in the arg list of this low level utility function. All the existing parameters are fairly generic, so the file could be easily transported to a different project and used with little or no modification. Instead, you might want to have something like this: int virNetDevTapCreateInBridgePort(const char *brname, int type; const unsigned char *macaddr, const char *interfaceID, int vnet_hdr, bool up, int *tapfd) Or alternately, keep the existing function clean, and write a new one using the same building blocks.

...

{ + int actualType; + + if (net) + actualType = virDomainNetGetActualType(net); + else + actualType = VIR_DOMAIN_NET_TYPE_BRIDGE; + if (virNetDevTapCreate(ifname, vnet_hdr, tapfd)< 0) return -1;

@@ -286,8 +296,14 @@ int virNetDevTapCreateInBridgePort(const char *brname, if (virNetDevSetMTUFromDevice(*ifname, brname)< 0) goto error;

- if (virNetDevBridgeAddPort(brname, *ifname)< 0) - goto error; + if (actualType == VIR_DOMAIN_NET_TYPE_OPENVSWITCH) { + virNetDevOpenvswitchPortPtr p = virDomainNetGetActualOpenvswitchPortPtr(net); + if (virNetDevOpenvswitchAddPort(brname, *ifname, macaddr, p)< 0) + goto error; + } else { + if (virNetDevBridgeAddPort(brname, *ifname)< 0) + goto error; + }

if (virNetDevSetOnline(*ifname, up)< 0) goto error; diff --git a/src/util/virnetdevtap.h b/src/util/virnetdevtap.h index fb35ac5..e0be898 100644 --- a/src/util/virnetdevtap.h +++ b/src/util/virnetdevtap.h @@ -24,6 +24,7 @@ # define __VIR_NETDEV_TAP_H__

# include "internal.h" +# include "conf/domain_conf.h"

int virNetDevTapCreate(char **ifname, int vnet_hdr, @@ -38,7 +39,8 @@ int virNetDevTapCreateInBridgePort(const char *brname, const unsigned char *macaddr, int vnet_hdr, bool up, - int *tapfd) + int *tapfd, + virDomainNetDefPtr net)

Same here.

...

ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3) ATTRIBUTE_RETURN_CHECK;

-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

Laine Stump

6 a.m.

On 01/27/2012 05:58 AM, Dan Wendlandt wrote:

...

Hello all,

I know of many people who want to spin up VMs using libvirt + kvm/qemu and attach those VMs to an openvswitch bridge (see: http://www.openvswitch.org). However, the only way I know of to get this working is a kludge that uses to tap devices along with <interface type="ethernet"> while running ovs-vsctl outside of libvirt. Even worse, doing this on RHEL/Fedora seems to require privilege tweaks (e.g., running qemu as root, not dropping capabilities), which may not be acceptable for production deployments (see: http://fedoraproject.org/wiki/How_to_debug_Virtualization_problems#Errors_us... ).

So I would like to start taking steps toward better libvirt/openvswitch integration. My initial step has the fairly limit goal of enabling kvm/qemu VM NICs to attach to an openvswitch bridge in much the same way VM NIC can already attached to the linux bridge. For example, specifying:

<interface type="openvswitch"> <source bridge="br0"/> <mac address="ca:fe:de;ad:be:ef"/> </interface>

It occurred to me while responding to Kyle's post about an openvswitch network type that maybe this would be better done as: <interface type='bridge'> <source bridge='br0' type='openvswitch'/> ... This way any management application that hadn't yet been taught about the openvswitch extensions would still be able to report useful information. This may also make some of the switch statements simpler (as well as the change to the NetDef - it would just need an interfaceID added to the bridge case of the union). (I made a similar suggestion for the network definition as, I see, did Roopa Prabhu, who managed to hit send before I did :-)).

Daniel P. Berrange

31 Jan 31 Jan

8:57 p.m.

On Fri, Jan 27, 2012 at 02:58:58AM -0800, Dan Wendlandt wrote:

...

Hello all,

I know of many people who want to spin up VMs using libvirt + kvm/qemu and attach those VMs to an openvswitch bridge (see: http://www.openvswitch.org). However, the only way I know of to get this working is a kludge that uses to tap devices along with <interface type="ethernet"> while running ovs-vsctl outside of libvirt. Even worse, doing this on RHEL/Fedora seems to require privilege tweaks (e.g., running qemu as root, not dropping capabilities), which may not be acceptable for production deployments (see: http://fedoraproject.org/wiki/How_to_debug_Virtualization_problems#Errors_us...).

So I would like to start taking steps toward better libvirt/openvswitch integration. My initial step has the fairly limit goal of enabling kvm/qemu VM NICs to attach to an openvswitch bridge in much the same way VM NIC can already attached to the linux bridge. For example, specifying:

<interface type="openvswitch"> <source bridge="br0"/> <mac address="ca:fe:de;ad:be:ef"/> </interface>

IMHO we should not be introducing a new "type" for OpenVSwitch. Contrary to common understanding, type='bridge' is not referring explicitly to Linux software bridging. Rather it refers to the concept of bridging the guest to the LAN at the network level, of which Linux software briding is one possible impl. OpenVSwitch is another possible impl. Other hypervisors have different impls too of course. If OpenVSwitch is available in the kernel, is there really any reason to *not* use it ? ie, could we just have <interface type="bridge"> <source bridge="br0"/> <mac address="ca:fe:de;ad:be:ef"/> </interface> and if we see that 'br0' is using OpenVSwitch, then libvirt can know to just do the right thing. That way every application that uses libvirt today will automatically be able to take advantage of the benefits OpenVSwitch brings without further work

...

I also wanted to add a new child element of <interface> that can be used to specify some OVS specific configuration. To start, I just want to expose a single 'interfaceid' value (this parameter is specific to openvswitch). Extending the previous example:

<interface type="openvswitch"> <source bridge="br0"/> <mac address="ca:fe:de;ad:be:ef"/> <openvswitchport interfaceid="interface-xyz"/> </interface>

We already have a "<virtualport>" element under <interface> that we use for setting 802.1Qb[gh] parameters. It seems to me that we can use this existing syntax to cope with the openvswitch port interfaceid perhaps ? Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Laine Stump

1 Feb 1 Feb

4:26 a.m.

On 01/31/2012 06:57 AM, Daniel P. Berrange wrote:

...

On Fri, Jan 27, 2012 at 02:58:58AM -0800, Dan Wendlandt wrote:

...
Hello all,

I know of many people who want to spin up VMs using libvirt + kvm/qemu and attach those VMs to an openvswitch bridge (see: http://www.openvswitch.org). However, the only way I know of to get this working is a kludge that uses to tap devices along with<interface type="ethernet"> while running ovs-vsctl outside of libvirt. Even worse, doing this on RHEL/Fedora seems to require privilege tweaks (e.g., running qemu as root, not dropping capabilities), which may not be acceptable for production deployments (see: http://fedoraproject.org/wiki/How_to_debug_Virtualization_problems#Errors_us...).

So I would like to start taking steps toward better libvirt/openvswitch integration. My initial step has the fairly limit goal of enabling kvm/qemu VM NICs to attach to an openvswitch bridge in much the same way VM NIC can already attached to the linux bridge. For example, specifying:

<interface type="openvswitch"> <source bridge="br0"/> <mac address="ca:fe:de;ad:be:ef"/> </interface> IMHO we should not be introducing a new "type" for OpenVSwitch. Contrary to common understanding, type='bridge' is not referring explicitly to Linux software bridging. Rather it refers to the concept of bridging the guest to the LAN at the network level, of which Linux software briding is one possible impl. OpenVSwitch is another possible impl. Other hypervisors have different impls too of course.

If OpenVSwitch is available in the kernel, is there really any reason to *not* use it ? ie, could we just have

<interface type="bridge"> <source bridge="br0"/> <mac address="ca:fe:de;ad:be:ef"/> </interface>

and if we see that 'br0' is using OpenVSwitch, then libvirt can know to just do the right thing.

Of course! <facepalm/> If the code can tell that br0 is an openvswitch bridge rather than a linux host bridge, we don't even have to add type='openvswitch' to the <source> element! I definitely like this the best, too.

...

That way every application that uses libvirt today will automatically be able to take advantage of the benefits OpenVSwitch brings without further work

Except that an interfaceid is needed (or is that optional?)

Ansis Atteka

2 Feb 2 Feb

11:30 p.m.

Libvirt has a function virNetDevBridgeRemovePort() which can remove port from the Linux Bridge, but it seems that no one calls it. Wanted to confirm if port removal happens automatically for Linux Bridges if VM goes down? The difference between OVS and Linux Bridge is that OVS will need a hook that removes all ports on VM shutdown event (and maybe also for some other events?). Thanks, Ansis On Tue, Jan 31, 2012 at 9:26 PM, Laine Stump <laine@laine.org> wrote:

...

On 01/31/2012 06:57 AM, Daniel P. Berrange wrote:

...
On Fri, Jan 27, 2012 at 02:58:58AM -0800, Dan Wendlandt wrote:

...
Hello all,

I know of many people who want to spin up VMs using libvirt + kvm/qemu and attach those VMs to an openvswitch bridge (see: http://www.openvswitch.org). However, the only way I know of to get this working is a kludge that uses to tap devices along with<interface type="ethernet"> while running ovs-vsctl outside of libvirt. Even worse, doing this on RHEL/Fedora seems to require privilege tweaks (e.g., running qemu as root, not dropping capabilities), which may not be acceptable for production deployments (see: http://fedoraproject.org/wiki/**How_to_debug_Virtualization_** problems#Errors_using_.**3Cinterface_type.3D.**27ethernet.27.2F.3E<http://fedoraproject.org/wiki/How_to_debug_Virtualization_problems#Errors_using_.3Cinterface_type.3D.27ethernet.27.2F.3E> ).

So I would like to start taking steps toward better libvirt/openvswitch integration. My initial step has the fairly limit goal of enabling kvm/qemu VM NICs to attach to an openvswitch bridge in much the same way VM NIC can already attached to the linux bridge. For example, specifying:

<interface type="openvswitch"> <source bridge="br0"/> <mac address="ca:fe:de;ad:be:ef"/> </interface>

IMHO we should not be introducing a new "type" for OpenVSwitch. Contrary to common understanding, type='bridge' is not referring explicitly to Linux software bridging. Rather it refers to the concept of bridging the guest to the LAN at the network level, of which Linux software briding is one possible impl. OpenVSwitch is another possible impl. Other hypervisors have different impls too of course.

If OpenVSwitch is available in the kernel, is there really any reason to *not* use it ? ie, could we just have

<interface type="bridge"> <source bridge="br0"/> <mac address="ca:fe:de;ad:be:ef"/> </interface>

and if we see that 'br0' is using OpenVSwitch, then libvirt can know to just do the right thing.

Of course! <facepalm/> If the code can tell that br0 is an openvswitch bridge rather than a linux host bridge, we don't even have to add type='openvswitch' to the <source> element!

I definitely like this the best, too.

That way every application that uses

...
libvirt today will automatically be able to take advantage of the benefits OpenVSwitch brings without further work

Except that an interfaceid is needed (or is that optional?)

-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/**mailman/listinfo/libvir-list<https://www.redhat.com/mailman/listinfo/libvir-list>

Laine Stump

3 Feb 3 Feb

1:08 a.m.

On 02/02/2012 09:30 AM, Ansis Atteka wrote:

...

Libvirt has a function virNetDevBridgeRemovePort() which can remove port from the Linux Bridge, but it seems that no one calls it.

Wanted to confirm if port removal happens automatically for Linux Bridges if VM goes down?

I didn't write that code, and have never traced through to verify this, but here's my understanding: * the tap device is created non-persistent and attached to its bridge (in virNetDevTapCreateInBridgePort()) * the fd of the open tap device is passed to qemu, and qemu uses it for network communication. * when it's time to detach the device or destroy the guest, libvirt just sends a monitor command to qemu, which ends up closing the tap device. Because the tap device was non-persistent, that automatically leads to 1) removal of the tap device from the bridge, and 2) deletion of the tap device itself. If a non-persistent tap device that is attached to an OVS is closed, does OVS not notice this and automatically detach it? You may want to experiment with that; possibly nothing is needed. (it would be much better if not, because otherwise there will need to be special care taken to prevent dangling tap devices (or dangling references to deleted tap devices))

...

The difference between OVS and Linux Bridge is that OVS will need a hook that removes all ports on VM shutdown event (and maybe also for some other events?).

Not just when a guest is shutdown, but also if a network device is detached from a running domain. If it's necessary to explicitly detach the tap from the OVS, whatever hook is added in to do that can hopefully just as well be identical for a Linux bridge (i.e., the only OVS-specific code should be in the lowest level function that does that bridge detach). Another point - since a shutdown initiated by the guest would likely end up destroying the tap device, we can't just add in a hook to detach it from the bridge - too early and the guest won't be done with it yet, too late and it will already not exist. I'm thinking that instead we may need to create the tap as persistent, then explicitly detach it from the bridge and delete it after the domain is finished with it. If that is needed, I think it should be done in a separate patch that is a prerequisite to the OVS patch. That way the two things can be tested independent of each other. As soon as I get out the patch I'm working on now, I'll take a quick look at this and see if I can point your more in the right direction for this prerequisite patch. In the meantime, it would be useful if you could do the experiment I mentioned above (i.e. do nothing and see if it explodes), and modify virNetDevBridgeRemovePort for your patch to do the right thing in the case of the bridge device being an OVS.

Ansis Atteka

3:28 a.m.

On Thu, Feb 2, 2012 at 6:08 PM, Laine Stump <laine@laine.org> wrote:

...

On 02/02/2012 09:30 AM, Ansis Atteka wrote:

...
Libvirt has a function virNetDevBridgeRemovePort() which can remove port from the Linux Bridge, but it seems that no one calls it.

Wanted to confirm if port removal happens automatically for Linux Bridges if VM goes down?

I didn't write that code, and have never traced through to verify this, but here's my understanding:

* the tap device is created non-persistent and attached to its bridge (in virNetDevTapCreateInBridgePort**())

* the fd of the open tap device is passed to qemu, and qemu uses it for network communication.

* when it's time to detach the device or destroy the guest, libvirt just sends a monitor command to qemu, which ends up closing the tap device. Because the tap device was non-persistent, that automatically leads to 1) removal of the tap device from the bridge, and 2) deletion of the tap device itself.

Yeah, the problem is that OVS does not do 1) when tap device gets destroyed.

...

If a non-persistent tap device that is attached to an OVS is closed, does OVS not notice this and automatically detach it? You may want to experiment with that; possibly nothing is needed.

...

(it would be much better if not, because otherwise there will need to be special care taken to prevent dangling tap devices (or dangling references to deleted tap devices))

The difference between OVS and

...
Linux Bridge is that OVS will need a hook that removes all ports on VM shutdown event (and maybe also for some other events?).

Not just when a guest is shutdown, but also if a network device is detached from a running domain.

If it's necessary to explicitly detach the tap from the OVS, whatever hook is added in to do that can hopefully just as well be identical for a Linux bridge (i.e., the only OVS-specific code should be in the lowest level function that does that bridge detach).

Another point - since a shutdown initiated by the guest would likely end up destroying the tap device, we can't just add in a hook to detach it from the bridge - too early and the guest won't be done with it yet, too late and it will already not exist. I'm thinking that instead we may need to create the tap as persistent, then explicitly detach it from the bridge and delete it after the domain is finished with it.

It wouldn't be too late. It's ok If actual tap device is not alive anymore.

...

If that is needed, I think it should be done in a separate patch that is a prerequisite to the OVS patch. That way the two things can be tested independent of each other.

As soon as I get out the patch I'm working on now, I'll take a quick look at this and see if I can point your more in the right direction for this prerequisite patch. In the meantime, it would be useful if you could do the experiment I mentioned above (i.e. do nothing and see if it explodes), and modify virNetDevBridgeRemovePort for your patch to do the right thing in the case of the bridge device being an OVS.

If by experiment you meant "Whether OVS automatically detaches tap device from OVS bridge when tap device gets closed?" then I can confirm that in contrast to Linux Bridge it does not do that. I will look into possibilities to remove ports on "detach-interface" and "VM shutdown" events.

Laine Stump

5:10 a.m.

On 02/02/2012 01:28 PM, Ansis Atteka wrote:

...

On Thu, Feb 2, 2012 at 6:08 PM, Laine Stump <laine@laine.org <mailto:laine@laine.org>> wrote:

On 02/02/2012 09:30 AM, Ansis Atteka wrote:

Libvirt has a function virNetDevBridgeRemovePort() which can remove port from the Linux Bridge, but it seems that no one calls it.

Wanted to confirm if port removal happens automatically for Linux Bridges if VM goes down?

* when it's time to detach the device or destroy the guest, libvirt just sends a monitor command to qemu, which ends up closing the tap device. Because the tap device was non-persistent, that automatically leads to 1) removal of the tap device from the bridge, and 2) deletion of the tap device itself.

...

Yeah, the problem is that OVS does not do 1) when tap device gets destroyed.

Interesting. So what happens if there is traffic for a port on the switch that has a now-nonexistent tap device? Does it ignore it? Explode?

...

If a non-persistent tap device that is attached to an OVS is closed, does OVS not notice this and automatically detach it? You may want to experiment with that; possibly nothing is needed.

(it would be much better if not, because otherwise there will need to be special care taken to prevent dangling tap devices (or dangling references to deleted tap devices))

The difference between OVS and Linux Bridge is that OVS will need a hook that removes all ports on VM shutdown event (and maybe also for some other events?).

Not just when a guest is shutdown, but also if a network device is detached from a running domain.

If it's necessary to explicitly detach the tap from the OVS, whatever hook is added in to do that can hopefully just as well be identical for a Linux bridge (i.e., the only OVS-specific code should be in the lowest level function that does that bridge detach).

Another point - since a shutdown initiated by the guest would likely end up destroying the tap device, we can't just add in a hook to detach it from the bridge - too early and the guest won't be done with it yet, too late and it will already not exist. I'm thinking that instead we may need to create the tap as persistent, then explicitly detach it from the bridge and delete it after the domain is finished with it.

...

It wouldn't be too late. It's ok If actual tap device is not alive anymore.

Well, as long as there are no negative consequences to the port being assigned past the time when the tap device is deleted. As I mentioned before, you should modify virNetDevBridgeRemovePort to do this removal (and do it appropriately depending on the type of bridge), but change it so that it should return success if it would fail simply because the tap device already is not on the bridge. (This way we can leave the tap device as non-persistent, and it will be an effective NOP for tap devices on linux bridges. Also, for consistency we should be just always calling the function to detach from the bridge if virDomainNetGetActualType(net) == VIR_DOMAIN_NET_TYPE_BRIDGE, regardless of whether it's a linux bridge or ovs. As far as where to put the calls to this - look for where there are calls to networkReleaseActualDevice(), and do it up above that (for example, you can see in qemuDomainDetachNetDevice() how there is an "if virDomainNetGetActualType(detach) == VIR_DOMAIN_NET_TYPE_DIRECT)" - you can change that to a switch(actualtype), and add a case for VIR_DOMAIN_NET_TYPE_BRIDGE that calls virNetDevBridgeRemovePort().

...

If that is needed, I think it should be done in a separate patch that is a prerequisite to the OVS patch. That way the two things can be tested independent of each other.

As soon as I get out the patch I'm working on now, I'll take a quick look at this and see if I can point your more in the right direction for this prerequisite patch. In the meantime, it would be useful if you could do the experiment I mentioned above (i.e. do nothing and see if it explodes), and modify virNetDevBridgeRemovePort for your patch to do the right thing in the case of the bridge device being an OVS.

If by experiment you meant "Whether OVS automatically detaches tap device from OVS bridge when tap device gets closed?" then I can confirm that in contrast to Linux Bridge it does not do that. I will look into possibilities to remove ports on "detach-interface" and "VM shutdown" events.

"detach-interface" = qemuDomainDetachNetDevice() - see above. "VM shutdown" = qemuProcessStop() - almost exactly the same situation as detach (turn the if() into a switch() and add a case for bridged devices). Don't forget LXC support! :-) It can use bridge network devices too. There are a few other places where it may be appropriate to do the bridge removal during error paths; this same search may show you some of them, and some others may show up when you search for where virNetDevTapCreateInBridgePort.

Ansis Atteka

5:47 a.m.

On Thu, Feb 2, 2012 at 10:10 PM, Laine Stump <laine@laine.org> wrote:

...

On 02/02/2012 01:28 PM, Ansis Atteka wrote:

On Thu, Feb 2, 2012 at 6:08 PM, Laine Stump <laine@laine.org> wrote:

...
On 02/02/2012 09:30 AM, Ansis Atteka wrote:

...
Libvirt has a function virNetDevBridgeRemovePort() which can remove port from the Linux Bridge, but it seems that no one calls it.

Wanted to confirm if port removal happens automatically for Linux Bridges if VM goes down?

* when it's time to detach the device or destroy the guest, libvirt just sends a monitor command to qemu, which ends up closing the tap device. Because the tap device was non-persistent, that automatically leads to 1) removal of the tap device from the bridge, and 2) deletion of the tap device itself.

Yeah, the problem is that OVS does not do 1) when tap device gets destroyed.

Interesting. So what happens if there is traffic for a port on the switch that has a now-nonexistent tap device? Does it ignore it? Explode?

To be more precise OVS has two parts: 1. user-space (database that contains bridge configuration); and 2. kernel-space (the actual state of the bridge) When I say that the port remains attached to the bridge after tap device is closed, I mean that this port config is still in user-space DB. If tap device with exactly the same name would reappear later, then the same config would be reapplied and kernel-space would end up doing exactly the the same thing.

...

...
If a non-persistent tap device that is attached to an OVS is closed, does OVS not notice this and automatically detach it? You may want to experiment with that; possibly nothing is needed.

...
(it would be much better if not, because otherwise there will need to be special care taken to prevent dangling tap devices (or dangling references to deleted tap devices))

The difference between OVS and

...
Linux Bridge is that OVS will need a hook that removes all ports on VM shutdown event (and maybe also for some other events?).

Not just when a guest is shutdown, but also if a network device is detached from a running domain.

If it's necessary to explicitly detach the tap from the OVS, whatever hook is added in to do that can hopefully just as well be identical for a Linux bridge (i.e., the only OVS-specific code should be in the lowest level function that does that bridge detach).

Another point - since a shutdown initiated by the guest would likely end up destroying the tap device, we can't just add in a hook to detach it from the bridge - too early and the guest won't be done with it yet, too late and it will already not exist. I'm thinking that instead we may need to create the tap as persistent, then explicitly detach it from the bridge and delete it after the domain is finished with it.

It wouldn't be too late. It's ok If actual tap device is not alive anymore.

Well, as long as there are no negative consequences to the port being assigned past the time when the tap device is deleted.

As I mentioned before, you should modify virNetDevBridgeRemovePort to do this removal (and do it appropriately depending on the type of bridge), but change it so that it should return success if it would fail simply because the tap device already is not on the bridge. (This way we can leave the tap device as non-persistent, and it will be an effective NOP for tap devices on linux bridges.

Also, for consistency we should be just always calling the function to detach from the bridge if virDomainNetGetActualType(net) == VIR_DOMAIN_NET_TYPE_BRIDGE, regardless of whether it's a linux bridge or ovs.

As far as where to put the calls to this - look for where there are calls to networkReleaseActualDevice(), and do it up above that (for example, you can see in qemuDomainDetachNetDevice() how there is an "if virDomainNetGetActualType(detach) == VIR_DOMAIN_NET_TYPE_DIRECT)" - you can change that to a switch(actualtype), and add a case for VIR_DOMAIN_NET_TYPE_BRIDGE that calls virNetDevBridgeRemovePort().

Ok. Thanks for the hints!

...

If that is needed, I think it should be done in a separate patch that

...
is a prerequisite to the OVS patch. That way the two things can be tested independent of each other.

As soon as I get out the patch I'm working on now, I'll take a quick look at this and see if I can point your more in the right direction for this prerequisite patch. In the meantime, it would be useful if you could do the experiment I mentioned above (i.e. do nothing and see if it explodes), and modify virNetDevBridgeRemovePort for your patch to do the right thing in the case of the bridge device being an OVS.

If by experiment you meant "Whether OVS automatically detaches tap device from OVS bridge when tap device gets closed?" then I can confirm that in contrast to Linux Bridge it does not do that. I will look into possibilities to remove ports on "detach-interface" and "VM shutdown" events.

"detach-interface" = qemuDomainDetachNetDevice() - see above.

"VM shutdown" = qemuProcessStop() - almost exactly the same situation as detach (turn the if() into a switch() and add a case for bridged devices).

Don't forget LXC support! :-) It can use bridge network devices too.

There are a few other places where it may be appropriate to do the bridge removal during error paths; this same search may show you some of them, and some others may show up when you search for where virNetDevTapCreateInBridgePort.

Ansis Atteka

4 Feb 4 Feb

12:42 a.m.

On Thu, Feb 2, 2012 at 10:47 PM, Ansis Atteka <aatteka@nicira.com> wrote:

...

On Thu, Feb 2, 2012 at 10:10 PM, Laine Stump <laine@laine.org> wrote:

...
On 02/02/2012 01:28 PM, Ansis Atteka wrote:

On Thu, Feb 2, 2012 at 6:08 PM, Laine Stump <laine@laine.org> wrote:

...
On 02/02/2012 09:30 AM, Ansis Atteka wrote:

...
Libvirt has a function virNetDevBridgeRemovePort() which can remove port from the Linux Bridge, but it seems that no one calls it.

Wanted to confirm if port removal happens automatically for Linux Bridges if VM goes down?

* when it's time to detach the device or destroy the guest, libvirt just sends a monitor command to qemu, which ends up closing the tap device. Because the tap device was non-persistent, that automatically leads to 1) removal of the tap device from the bridge, and 2) deletion of the tap device itself.

Yeah, the problem is that OVS does not do 1) when tap device gets destroyed.

Interesting. So what happens if there is traffic for a port on the switch that has a now-nonexistent tap device? Does it ignore it? Explode?

To be more precise OVS has two parts:

1. user-space (database that contains bridge configuration); and 2. kernel-space (the actual state of the bridge)

When I say that the port remains attached to the bridge after tap device is closed, I mean that this port config is still in user-space DB. If tap device with exactly the same name would reappear later, then the same config would be reapplied and kernel-space would end up doing exactly the the same thing.

...
...
If a non-persistent tap device that is attached to an OVS is closed, does OVS not notice this and automatically detach it? You may want to experiment with that; possibly nothing is needed.

...
(it would be much better if not, because otherwise there will need to be special care taken to prevent dangling tap devices (or dangling references to deleted tap devices))

The difference between OVS and

...
Linux Bridge is that OVS will need a hook that removes all ports on VM shutdown event (and maybe also for some other events?).

Not just when a guest is shutdown, but also if a network device is detached from a running domain.

If it's necessary to explicitly detach the tap from the OVS, whatever hook is added in to do that can hopefully just as well be identical for a Linux bridge (i.e., the only OVS-specific code should be in the lowest level function that does that bridge detach).

Another point - since a shutdown initiated by the guest would likely end up destroying the tap device, we can't just add in a hook to detach it from the bridge - too early and the guest won't be done with it yet, too late and it will already not exist. I'm thinking that instead we may need to create the tap as persistent, then explicitly detach it from the bridge and delete it after the domain is finished with it.

It wouldn't be too late. It's ok If actual tap device is not alive anymore.

Well, as long as there are no negative consequences to the port being assigned past the time when the tap device is deleted.

As I mentioned before, you should modify virNetDevBridgeRemovePort to do this removal (and do it appropriately depending on the type of bridge), but change it so that it should return success if it would fail simply because the tap device already is not on the bridge. (This way we can leave the tap device as non-persistent, and it will be an effective NOP for tap devices on linux bridges.

Also, for consistency we should be just always calling the function to detach from the bridge if virDomainNetGetActualType(net) == VIR_DOMAIN_NET_TYPE_BRIDGE, regardless of whether it's a linux bridge or ovs.

As far as where to put the calls to this - look for where there are calls to networkReleaseActualDevice(), and do it up above that (for example, you can see in qemuDomainDetachNetDevice() how there is an "if virDomainNetGetActualType(detach) == VIR_DOMAIN_NET_TYPE_DIRECT)" - you can change that to a switch(actualtype), and add a case for VIR_DOMAIN_NET_TYPE_BRIDGE that calls virNetDevBridgeRemovePort().

Ok. Thanks for the hints!

...
If that is needed, I think it should be done in a separate patch that

...
is a prerequisite to the OVS patch. That way the two things can be tested independent of each other.

As soon as I get out the patch I'm working on now, I'll take a quick look at this and see if I can point your more in the right direction for this prerequisite patch. In the meantime, it would be useful if you could do the experiment I mentioned above (i.e. do nothing and see if it explodes), and modify virNetDevBridgeRemovePort for your patch to do the right thing in the case of the bridge device being an OVS.

If by experiment you meant "Whether OVS automatically detaches tap device from OVS bridge when tap device gets closed?" then I can confirm that in contrast to Linux Bridge it does not do that. I will look into possibilities to remove ports on "detach-interface" and "VM shutdown" events.

"detach-interface" = qemuDomainDetachNetDevice() - see above.

"VM shutdown" = qemuProcessStop() - almost exactly the same situation as detach (turn the if() into a switch() and add a case for bridged devices).

Don't forget LXC support! :-) It can use bridge network devices too.

There are a few other places where it may be appropriate to do the bridge removal during error paths; this same search may show you some of them, and some others may show up when you search for where virNetDevTapCreateInBridgePort.

Wouldn't it be simpler to do port removal just inside the

networkReleaseActualDevice() function if this is interface that was attached to an OVS bridge? Would this make any problems to the overall design? The code seems to work...

Laine Stump

2:18 a.m.

On 02/03/2012 10:42 AM, Ansis Atteka wrote:

...

On Thu, Feb 2, 2012 at 10:10 PM, Laine Stump <laine@laine.org <mailto:laine@laine.org>> wrote:

There are a few other places where it may be appropriate to do the bridge removal during error paths; this same search may show you some of them, and some others may show up when you search for where virNetDevTapCreateInBridgePort.

...

Wouldn't it be simpler to do port removal just inside the networkReleaseActualDevice() function if this is interface that was attached to an OVS bridge? Would this make any problems to the overall design? The code seems to work...

No. networkAllocateActualDevice and networkReleaseActualDevice are only intended to acquire network-specific configuration/resourced for the guest from a <network> object, and release it later; they shouldn't be used as a clearing house for general guest interface setup/teardown. Looking through all this code I've been thinking it might be nice to create such "clearing house" setup/teardown functions, but what I've seen so far is that there are places where only some of the full list of functions are called (don't know if that means missing functionality / bugs, or genuinely different situations that require a different set of actions), and am too busy now with other projects to investigate further. In the meantime, if you can try to make your additions in the (at least) two places as similar as possible, that will make it easier for us to later go in and create functions that do complete interface setup/teardown.

Ansis Atteka

5:03 a.m.

Laine, How do I make interfaceid persistent across libvirtd restarts if user did not specify one in domain XML config file and libvirtd generated it for him automatically? It seems that different interfaceid is regenerated each time the libvirtd is restarted. If I understand correctly then two XML files come into picture: 1. /usr/local/etc/libvirt/qemu/VM1.xml, which stores persistent *config*for the whole domain life-time (without UUID); and 2. /usr/local/var/run/libvirt/qemu/VM1.xml, which stores the *instantiated config* with the generated UUID after reading the original *config*file. But unfortunately after libvirtd is restarted this file gets destroyed and has different UUID next time Domain is started. I guess that my goal would be to update file #1 with the generate UUID. Is that supported or would the user have to manually specify the UUID in * config* XML? Thanks, Ansis On Fri, Feb 3, 2012 at 7:18 PM, Laine Stump <laine@laine.org> wrote:

...

On 02/03/2012 10:42 AM, Ansis Atteka wrote:

On Thu, Feb 2, 2012 at 10:10 PM, Laine Stump <laine@laine.org> wrote:

...
...
There are a few other places where it may be appropriate to do the bridge removal during error paths; this same search may show you some of them, and some others may show up when you search for where virNetDevTapCreateInBridgePort.

Wouldn't it be simpler to do port removal just inside the networkReleaseActualDevice() function if this is interface that was attached to an OVS bridge? Would this make any problems to the overall design? The code seems to work...

No. networkAllocateActualDevice and networkReleaseActualDevice are only intended to acquire network-specific configuration/resourced for the guest from a <network> object, and release it later; they shouldn't be used as a clearing house for general guest interface setup/teardown.

Looking through all this code I've been thinking it might be nice to create such "clearing house" setup/teardown functions, but what I've seen so far is that there are places where only some of the full list of functions are called (don't know if that means missing functionality / bugs, or genuinely different situations that require a different set of actions), and am too busy now with other projects to investigate further.

In the meantime, if you can try to make your additions in the (at least) two places as similar as possible, that will make it easier for us to later go in and create functions that do complete interface setup/teardown.

5054

Age (days ago)

5061

Last active (days ago)

List overview

Download

22 comments

7 participants

participants (7)

Ansis Atteka
Dan Wendlandt
Daniel P. Berrange
kmestery
Kyle Mestery
Laine Stump
Roopa Prabhu