From: sendmail [mailto:justsendmailnothingelse@gmail.com] On Behalf Of Laine Stump
Sent: Saturday, September 03, 2016 10:50 PM
To: Libvirt <libvir-list@redhat.com>
Cc: Moshe Levi <moshele@mellanox.com>; Edan David <edand@mellanox.com>
Subject: Re: [libvirt] <interface type='direct'>

 

On 09/03/2016 11:08 AM, Moshe Levi wrote:

 
 
-----Original Message-----
From: sendmail [mailto:justsendmailnothingelse@gmail.com] On Behalf Of
Laine Stump
Sent: Thursday, September 01, 2016 5:59 PM
To: Libvirt <libvir-list@redhat.com>
Cc: Moshe Levi <moshele@mellanox.com>; Edan David
<edand@mellanox.com>
Subject: Re: [libvirt] <interface type='direct'>
 
On 09/01/2016 04:05 AM, Moshe Levi wrote:
Hi,
 
In OpenStack we have a port type macvtap.
Mavtap port is just a tap device connected to VF.
 
In Libvirt the guest xml look like
<interface type='direct'>
   <mac address='fa:16:3e:b1:06:4e'/>
   <source dev='p1p6' mode='passthrough'/>
   <target dev='macvtap1'/>
   <model type='virtio'/>
   <driver name='vhost'/>
   <alias name='net0'/>
   <address type='pci' domain='0x0000' bus='0x00' slot='0x03'
function='0x0'/> </interface>
 
 
In the hypervisor we can see that the  mac of the VF which is
fa:16:3e:f3:9b:e8 - is set by OpenStack see [1]
9: ens3f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq
master ovs-system state UP mode DEFAULT group default qlen 1000
     link/ether 7c:fe:90:29:24:4e brd ff:ff:ff:ff:ff:ff
     vf 0 MAC 00:00:00:00:00:00, spoof checking off, link-state disable
     vf 1 MAC 00:00:00:00:00:00, spoof checking off, link-state disable
     vf 2 MAC fa:16:3e:f3:9b:e8, vlan 48, spoof checking on, link-state enable
     vf 3 MAC fa:16:3e:f6:02:c8, vlan 48, spoof checking on,
link-state enable
41: ens3f4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq
state UP mode DEFAULT group default qlen 1000
     link/ether fa:16:3e:f6:02:c8 brd ff:ff:ff:ff:ff:ff
42: macvtap0@ens3f4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu
1500 qdisc fq_codel state UP mode DEFAULT group default qlen 500
     link/ether fa:16:3e:f6:02:c8, brd ff:ff:ff:ff:ff:ff
 
The netdevice of the VF which is ens3f4 has also the same mac. This
mac is set when using Libvirt 1.2.2 (Ubuntu 14.04), But when we tested
with new Libvirt versions >= 1.2.17 (Fedora 23/Ubuntu 16.04) the mac
netdevice of the VF (ens3f4) is not set.
This change in Libvirt breaks the guest from getting DHCP in OpenStack.
Do you know why the behavior change in newer releases?
 
The MAC address is now set with a netlink command to set the VFINFO of
the particular VF# of the PF. This change was made in response to a bug
report stating that once the MAC address had been set for a hostdev
assignment of a VF (in which case this method is required), it was no longer
possible to set the MAC address for macvtap passthrough (the VF driver
would complain "MAC has been administratively set", on Intel igbvf at least).       
Unfortunately I recently found that when you set the MAC address in this
manner, it doesn't take effect on the actual device
- it's only saved in memory to be applied the *next time* the host driver is
rebound to the VF.
Are saying that the change was to update the MAC of the VF? 
So I don’t understand how this effect the issue that  VF netdevice  MAC don't get set


Look at the explanation in commit cb3fe38c and also
https://bugzilla.redhat.com/show_bug.cgi?id=1113474

 


That commit switched from using a simple ioctl(SIOCSIFHWADDR) to the VF's netdev name, to using a netlink RTM_SETLINK message to the netdev of the *PF* for the given VF.

This was done because the latter is the *only* way you can set the MAC address for a VF that you're going to assign to the guest with vfio device assignment, and once you've set the MAC address that way, future attempts to set the MAC address with ioctl(SIOCSIGHWADDR) result in failure and a kernel log like this:


kernel: igb 0000:0e:00.1: VF 1 attempted to override administratively set MAC address
kernel: Reload the VF driver to resume operations

Looking into the kernel, it appears that once the MAC address for a VF has been set via RTM_SETLINK, the igb driver (and I believe also the ixgbe driver, not sure about others) doesn't allow it to be changed via ioctl until the PF driver is reloaded (which can't realistically be done on an active system)


But recently there was another report of the MAC address not getting set properly for macvtap passthrough mode when the device is an SRIOV VF (I can't find it in bugzilla, so it must have been an email to one of the lists) and when I tried it myself I found they were correct - in the output of "ip link show" the MAC address showed in the list of VFs under the PF is correctly modified, but it's not set properly in the VF's netdev instance - apparently the MAC addresses in the VF list aren't set in the VF's netdev immediately, they're just saved to be set *the next time the VF is re-bound to the VF netdev driver*. I think in the past the interface may have been in promiscuous mode so it didn't matter, but now it isn't? I'm not sure as I haven't had much time to investigate.

Does that make any more sense now?

 

Yes Thanks J

 

Since I don't see a reasonably efficient way to get this to work, I need to
make a patch to revert to the old behavior, and we'll then just have to tell
people "If you do hostdev device assignment of VFs, then you can't later re-
use the same device for macvtap passthrough mode".
 
(actually, I *think* an alternative would be to unbind/rebind the host driver
to the VF after setting the VF MAC address, but that seems a bit
disruptive/extreme to  work around a problem that is probably only seen in
QE labs, but not in the real world (realistically, production systems likely use
either hostdev or macvtap, and don't switch back and forth between them).
 
A question - I notice you have the vlan set for the VF. Does *that* properly
take effect? (it's set in the same manner as the MAC address, via a netlink
command to set the VFINFO)
I am not sure what you mean, but we set the vlan  in OpenStack after we create the guest xml.


What command do you use to set it? Do you use "ip link set $PF vf $VF# vlan $VLANID" ? I think that's what it's showing here:

Yes we use ip link set $PF vlan $VLANID to set the vlan.

So in the guest xml we don’t put the vlan id for <interface type='direct'> only for interface type='hostdev' are you saying that is should be supported

In both? Should I open a bug for this as well?


https://review.openstack.org/#/c/364121/1/nova/network/linux_net.py

(I don't know my way around openstack code, but arrived at that page via clicking on links from a google search)


 
In OpenStack we put the MAC of the VF and the vlan using iproute2.
I just want to know if that should be the part of Libvirt setting mac/vlan or 
Libvirt  just create the macvtap interface and we should put the mac/vlan? 


libvirt *should* do it.


 
 
We have a WIP patch in OpenStack  for setting also the mac for the
netdevice of the VF  [2]. Just wanted to know that this is the correct
approach.
Can you confirm that setting the VF netdevice mac in OpenStack is a reasonable workaround for the newer Libvirt versions?


If libvirt isn't getting the job done, and you can set it yourself, then that's a workaround. I don't know that I'd call it "reasaonable" though. If everybody puts in special code to workaround bugs in libvirt (which is apparently what's been done) rather than actually reporting the bug (what you're doing now - Thanks!) then we are tricked into thinking that either the code works, or that nobody is using it so it doesn't matter if it's broken.

Sure, I would like to get the fix in Libvirt so I opened this bug https://bugzilla.redhat.com/show_bug.cgi?id=1372944

But then again I can’t assume that everyone will use the latest Libvirt so I will put a workaround in OpenStack with TODO for removal


The *best* way of overcoming this problem is to fix libvirt so it does what it's supposed to do.

It's possible we can make it work by adding some operation after we send the RTM_SETLINK (maybe unbind the VF from its netdev driver, then re-bind, but that seems so drastic and time consuming!), or maybe we'll have to revert to using ioctl(SIOCSIFHWADDR), but of course that will fail if the interface has been used for hostdev assignment since the last host reboot.

It's interesting that openstack is apparently using the RTM_SETLINK method to set the mac address (afaik, that's what is used by the "ip link set $pf_ifname vf $vf_num mac $mac_addr vlan $vlanid" command that's shown in the bit of code from nova/network/linux_net.py at the link I posted above).