Stefan Bader wrote:
> On 18.12.2013 14:28, Ian Campbell wrote:
>
>> On Wed, 2013-12-18 at 14:12 +0100, Stefan Bader wrote:
>>
>>> On 18.12.2013 13:27, Ian Campbell wrote:
>>>
>>>> On Tue, 2013-12-17 at 18:32 +0100, Stefan Bader wrote:
>>>>
>>>>>> Might this libxl fix be relevant:
>>>>>> commit 5420f26507fc5c9853eb1076401a8658d72669da
>>>>>> Author: Jim Fehlig <jfehlig(a)suse.com>
>>>>>> Date: Fri Jan 11 12:22:26 2013 +0000
>>>>>>
>>>>>> libxl: Set vfb and vkb devid if not done so by the
caller
>>>>>>
>>>>>> Other devices set a sensible devid if the caller has
not done so.
>>>>>> Do the same for vfb and vkb. While at it, factor
out the common code
>>>>>> used to determine a sensible devid, so it can be
used by other
>>>>>> libxl__device_*_add functions.
>>>>>>
>>>>>> Signed-off-by: Jim Fehlig <jfehlig(a)suse.com>
>>>>>> Acked-by: Ian Campbell
<ian.campbell(a)citrix.com>
>>>>>> Committed-by: Ian Campbell
<ian.campbell(a)citrix.com>
>>>>>>
>>>>>> and a follow up in dfeccbeaa. Although the comment implies that
nic's
>>>>>> were already correctly assigning a devid if the caller specified
-1, so
>>>>>> I don't know why it doesn't work for you :-(
>>>>>>
>>>>> Ok, yes, the commit above indeed changes libxl__device_nic_add to
call
>>>>> libxl__device_nextid for the devid... Just how is this actually
called.
>>>>> Maybe not sufficient but "git grep libxl__device_nic_add"
in the xen code only
>>>>> shows the definition and a declaration in libxl_internal.h to me...
>>>>>
>>>> I have a feeling a macro might be involved...
>>>>
>>>> Here we go, look for DEFINE_DEVICE_REMOVE in libxl.c. We should really
>>>> add the eventual function names in comments to provide grep fodder....
>>>>
>>> Oh duh, yeah. So in DEFINE_DEVICE_ADD a libxl_device_nic_add is created
which
>>> calls to libxl__device_nic_add. When I look for the single _ version I find
a
>>> call from xl_cmdimpl.c and its public declaration in libxl.h.
>>> So I guess the bug is that libvirt in the libxl driver never seems to do so
>>>
>> The macro creates libxl__add_nics which adds the nics from the
>> libxl_domain_config->nics array. I don't think libvirt needs to call
>> libxl_device_nic_add manually unless it is hotplugging a new nic at
>> runtime.
>>
>>
>
> Hm, so I think this is the path:
>
> libxl_domain_create_new
> -> do_domain_create
> -> initiate_domain_create
> -> libxl__bootloader_run (HVM domain, skipping bootloader)
> <- domcreate_bootloader_done
> -> domcreate_rebuild_done
> <- domcreate_launch_dm
> -> libxl__spawn_local_dm
> <- domcreate_devmodel_started
>
> In libxl__spawn_local_dm, there is the following loop:
>
> for (i = 0; i < d_config->num_nics; i++) {
> /* We have to init the nic here, because we still haven't
> * called libxl_device_nic_add at this point, but qemu needs
> * the nic information to be complete.
> */
> ret = libxl__device_nic_setdefault(gc, &d_config->nics[i], domid);
> if (ret)
> goto error_out;
> }
>
> So I think when starting the dm, the devid just is not set as setdefault does
> not seem to do so. I would be done in the later domcreate_devmodel_started
> callback but that is too late for the generated qemu arguments.
>
Sorry for jumping in late...
I stumbled across this problem just before openSUSE13.1 released and did
a quick fix in libvirt
https://build.opensuse.org/package/view_file/Virtualization:openSUSE13.1/...
I removed setting the NIC devid in the libxl driver a while back to be
consistent with other devices
http://libvirt.org/git/?p=libvirt.git;a=commit;h=ba64b97134a6129a48684f22...
The quick fix was to essentially revert the above commit until I could
investigate further. Thank you for now having done that investigation
:). Can the devid assignment logic be moved from
libxl__device_nic_add() to libxl__device_nic_setdefault()?
It certainly seems like it would be more natural to do it there.
I suspect it might be done this way because at setdefault time you might
be walking a list of nics none of which have been created yet -- so
looking in xenstore would return "devid zero is free" for every one of
them?
How about we:
* move the init to setdefault to catch the single NIC added via
hotplug case
* we add somewhere early in the domain create path a call to a
function which assigns devids to an entire array of devices (and
do it for all the different device types). Perhaps in
initiate_domain_create() after the calls to
libxl__domain_create_info_setdefault and
libxl__domain_build_info_setdefault but before the loop calling
libxl__device_disk_setdefault for the disks.
* perhaps that same function should call setdefault too, after
having assigned the device, rather than it being done later in
an adhoc way?
Does that sound at all plausible?
Ian.