On 05/21/2013 11:34 AM, Daniel P. Berrange wrote:
On Tue, May 21, 2013 at 11:29:26AM -0400, Laine Stump wrote:
> On 05/21/2013 07:44 AM, Ján Tomko wrote:
>> On 05/21/2013 01:37 PM, Laine Stump wrote:
>>> On 05/21/2013 04:03 AM, Ján Tomko wrote:
>>>> On 05/21/2013 09:32 AM, Dominik Mostowiec wrote:
>>>>> hi,
>>>>> I try to add 2 VF by "hostdev".
>>>>> Networks (vnet0, vnet1) with:
>>>>> <forward mode='hostdev' managed='yes'>
>>>>> <pf dev='eth1'/>
>>>>> .....
>>>>>
>>>>> Domain:
>>>>> <interface type="network">
>>>>> <source network="vnet0"/>
>>>>> ....
>>>>> <interface type="network">
>>>>> <source network="vnet1"/>
>>>>> ....
>>>>>
>>>>> virsh create error:
>>>>> "error: internal error process exited while connecting to
monitor: kvm:
>>>>> -device
pci-assign,configfd=25,host=01:10.1,id=hostdev0,bus=pci.0,addr=0x4:
>>>>> Duplicate ID 'hostdev0' for device"
>>>>>
>>>> Hi,
>>>>
>>>> it seems we have been assigning the same id to all network hostdevs until
this
>>>> recent commit (not yet released):
>>>>
>>>>
http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=6597cc25
>>> If that patch has such an effect, it's purely accidental. Have you
>>> tested this? (I plan to test a before and after as soon as I've had
>>> breakfast)
>>>
>> I haven't tested it and looking at the code again, I might've been wrong
:(
>>
>> Sorry about that.
> Nothing to be sorry about even if you weren't right. And as it turns out
> you *were* right!
>
> I was curious why, so I tested with a 1.0.4 build running under gdb. The
> reason for the failure was that the person who wrote the code in
> qemuBuildCommandLine that called qemuAssignDeviceHostdevAlias() (as well
> as the person who reviewed that code, which would be me :-/) missed a
> detail of qemuAssignDeviceHostdevAlias() - it only searches for a free
> alias index if you send -1 as the index, otherwise it just uses the
> index you send. This is what was being called:
>
> qemuAssignDeviceHostdevAlias(def, hostdev, (def->nhostdevs-1))
>
> So the first time def->nhostdevs-1 == -1, and a search would be made in
> the (empty) hostdevs array, resulting in an index of 0 being used. The
> second time it's called, def->nhostdevs-1 == 0, so it would just use 0
> without even checking for a conflict.
>
> My patch for VFIO (the one Jan pointed out as being a fix) completely
> removed this code from qemuBuildCommandLine(), relying instead on the
> alias being assigned with all other aliases in
> qemuAssignDeviceAliases(). (I didn't do this on purpose to fix this bug
> though; I did it because the old code had become redundant).
>
> Anyway, thanks for being so observant. I otherwise would have wasted
> investigation time wondering why it wasn't broken for me.
So the fact that we missed this suggests we have a gap in our XML->ARGV
test coverage for host device assignment, right ?
It seems you're on a mission today :-)
Yes. The problem with network commandlines has been that the setup of
the tap devices has always been depply intertwined with the commandline
generation, making it impossible to test from make check. It's been an
annoyance to me for a long time, but finally got to my conscious brain
last month when I put in the vfio patches. (this was the side topic I
opened when Osier sent his patches that add a callback object to the
qemuBuildCommandLine() args).
This is also a problem with generic legacy hostdev devices (generic vfio
host devices don't seem to have a problem), but their only need is a bit
of sysfs.
I would actually prefer the network device host-side setup to be handled
completely outside of qemuBuildCommandLine() in
qemuNetworkPrepareDevices() (as is done for usb and pci passthrough
devices) rather than in a callback, but that would require sending an
array of file descriptors into qemuBuildCommandLine(), and there's no
convenient place to do that. (Note that the arg "vmop" is also only
needed for network device host-side setup, and so moving that setup
outside of qemuBuildCommandline() would eliminate the need for that arg).
You've said that you're against adding hidden driver specific state
information to the virDomainDef (which makes sense. although I will
point out that there is already both driver-specific (everything in
<driver>) and state data (the <actual> subelement which isn't
driver-specific, but describes what type of network device was actually
assigned to the guest) in the virDomainDef), so we're left with either:
1) moving the network device "preparation" into qemuNetworkPrepareDevices,
then replacing "vmop" with a "networkData" arg (or maybe
"deviceData",
so that it can easily have similar stuff for other devices added to it)
that contains all the fd's for the tap and vhost-net devices.
or
2) moving the network device preparation into a callback that's called
from qemuBuildCommandLine(), and keeping vmop simply to pass it into
this callback.
(unless someone else has a better idea). My opinion is that option (1)
is better, because it puts all the netdev setup in a single place
(rather than some being in qemuNetworkPrepareDevices() and some in a new
callback function), it doesn't add to the qemuBuildCommandLine()
arglist, since it removes one arg while adding another, and it makes the
netdev setup more symmetrical with error-recovery teardown, which is
also called from qemuProcessStart() (as is qemuNetworkPrepareDevices()).
As you can tell, I'm still resistant to have a callback (even though the
infrastructure is already there), because I think that's still giving
qemuBuildCommandLine() an unlisted side effect of messing around with
device setup, while it should be simply taking some input and using it
to construct a commandline (I don't really have a problem with simply
sending it a fake sysfs, as that is just const input data; what I don't
like is when it is creating tap devices, etc).